neetpiq

My two cents about software development on the web


Cloud Platforms


Warning: file_put_contents(C:\hostingspaces\neetpiqc\neetpiq.com\wwwroot/wp-content/cache/0bac12a71080e2f0238f1b173488ab6f.spc) [function.file-put-contents]: failed to open stream: Permission denied in C:\hostingspaces\neetpiqc\neetpiq.com\wwwroot\wp-includes\class-simplepie.php on line 8680

Warning: C:\hostingspaces\neetpiqc\neetpiq.com\wwwroot/wp-content/cache is not writeable. Make sure you've set the correct relative or absolute path, and that the location is server-writable. in C:\hostingspaces\neetpiqc\neetpiq.com\wwwroot\wp-includes\class-simplepie.php on line 1781
  • Windows Server 2012 R2, IIS 8.5, WebSockets and .NET 4.5.2 (AppHarbor Blog)

    During the last couple of weeks we've upgraded worker servers in the US and EU regions to support Windows Server 2012 R2, IIS 8.5 and .NET 4.5.2. Major upgrades like this can be a risky and lead to compatibility issues, and the upgrade was carefully planned and executed to maximize compatibility with running applications. Application performance and error rates have been closely monitored throughout the process and fortunately, chances are you haven't noticed a thing: We've detected migration-related issues with less than 0.1% of running applications.

    Many of the new features and configuration improvements enabled by this upgrade will be gradually introduced over the coming months. This way we can ensure a continued painless migration and maintain compatibility with the previous Windows Server 2008 R2/IIS 7.5 setup, while we iron out any unexpected kinks if and when they crop up. A few changes have however already been deployed that we wanted to fill you in on.

    WebSocket support and the beta region

    Last year the beta region featuring experimental WS2012 and WebSockets support was introduced. The beta region allowed customers to test existing and new apps on the new setup while we prepared and optimized it for production use. This approach has been an important factor in learning about subtle differences between the server versions, and addressing pretty much all compatibility issues before upgrading the production regions. Thanks to all the customers who provided valuable feedback during the beta and helped ensure a smoother transition for everyone.

    An important reason for the server upgrade was to support WebSocket connections. Now that the worker servers are running WS2012 and IIS 8.5 we've started doing just that. Applications in the old beta region have been merged into the production US region and the beta region is no longer available when you create a new application.

    Most load balancers already support WebSockets and the upgrade is currently being rolled out to remaining load balancers. Apps created since August 14th fully support WebSockets and no configuration is necessary: AppHarbor will simply detect and proxy connections as expected when a client requests a Connection: Upgrade.

    Some libraries, such as SignalR, will automatically detect and prefer WebSocket connections when supported by both the server and client. Until WebSocket connections are supported on all load balancers some apps may attempt and fail during the WebSocket handshake. This should not cause issues since these libraries will fall back to other supported transports, and affected apps will automatically be WebSocket-enabled when supported by the load balancers.

    CPU throttling

    One of the major challenges that has held back this upgrade is a change in the way we throttle worker CPU usage. CPU limitations are the same as before, but the change can affect how certain CPU-intensive tasks are executed. Resources and documentation on this subject are limited, but testing shows that CPU time is more evenly scheduled across threads, leading to higher concurrency, consistency and stability within processes. While this is overall an improvement it can also affect peak performance on individual threads, and we're currently investigating various approaches to better support workloads affected by this.

    For the curious, we previously used a CPU rate limit registry setting to limit CPU usage per user account, but this is no longer supported on Windows Server 2012. We now use a combination of IIS 8's built-in CPU throttling and a new CPU rate control for job objects to throttle background workers.

    If you've experienced any issues with this upgrade or have feedback about the process, please don't hesitate to reach out.

  • Heartbleed Security Update (AppHarbor Blog)

    Updated on April 10, 2014 with further precautionary steps in the "What can you do" section below.

    On April 7, 2014, a serious vulnerability in the OpenSSL library (CVE-2014-0160) was publicly disclosed. OpenSSL is a cryptography library used for the majority of private communications across the internet.

    The vulnerability, nicknamed "Heartbleed", would allow an attacker to steal secret certificates keys, names and passwords of users and other secrets encrypted using the OpenSSL library. As such it represents a major risk for a large number of internet application and services, including AppHarbor.

    What has AppHarbor done about this

    AppHarbor responded to the announcement by immediately taking steps to remediate the vulnerability:

    1. We updated all affected components with the updated, secure version of OpenSSL within the first few hours of the announcement. This included SSL endpoints and load balancers, as well as other infrastructure components used internally at AppHarbor.
    2. We re-keyed and redeployed all potentially affected AppHarbor SSL certificates (including the piggyback *.apphb.com certificate), and the old certificates are being revoked.
    3. We notified customers with custom SSL certificates last night, so they could take steps to re-key and reissue certificates, and have the old ones revoked.
    4. We reset internal credentials and passwords.
    5. User session cookies were revoked, requiring all users to sign in again.

    Furthermore, AppHarbor validates session cookies against your previously known IP addresses as part of the authorization process. This has reduced the risk of a stolen session cookie being abused. Perfect forward secrecy was deployed to some load balancers, making it impossible to read intercepted and encrypted communication with stolen keys. Forward secrecy has since been deployed to all load balancers hosted by AppHarbor.

    What can you do

    We have found no indication that the vulnerability was used to attack AppHarbor. By quickly responding to the issue and taking the steps mentioned above we effectively stopped any further risk of exposure. However, due to the nature of this bug, we recommend users who want to be extra cautious to take the following steps:

    1. Reset your AppHarbor password.
    2. Review the sign-in and activity history on your user page for any suspicious activity.
    3. Revoke authorizations for external applications that integrates with AppHarbor.
    4. Recreate, reissue and reinstall custom SSL certificates you may have installed, and revoke the old ones. Doing this may revoke the old certificates, so make sure you're ready to install the new certificates.
    5. Read the details about the Heartbleed bug here and assess the risks relative to your content.

    Updated instructions (April 10, 2014):

    While we still have not seen any abuse on AppHarbor as a result of this bug, we now also encourage you to take these precautionary steps:

    1. Reset your build URL token.
    2. If you're using one of the SQL Server or MySQL add-ons: Reset the database password. Go to the add-on's admin page and click the "Reset Password" button. This will immediately update the configuration on AppHarbor and redeploy the application (with a short period of downtime until it is redeployed).
    3. If you're using the Memcacher add-on: Reinstall the add-on by uninstalling and installing it.
    4. Rotate/update sensitive information in your own configuration variables.

    If you have hardcoded passwords/connection strings for any your add-ons this is a good opportunity to start using the injected configuration variables. You can find instructions for the SQL add-ons here and the Memcacher add-on here. This way your application is automatically updated when you reset the add-ons, or when an add-on provider updates the configuration. If this is not an option you should immediately update your code/configuration files and redeploy the application after the configuration is updated.

    Stay tuned

    Protecting your code and data is our top priority, and we continue to remediate and asses the risks in response to this issue. We'll keep you posted with any new developments, so stay tuned on Twitter and the blog for important updates. We're of course also standing by on the support forums if you have any questions or concerns.

  • Librato integration and built-in perfomance metrics (AppHarbor Blog)

    Librato Dashboard

    Being able to monitor and analyze key application metrics is an essential part of developing stable, performant and high quality web services that meets you business requirements. Today we’re announcing a great new set of features to provide a turnkey solution for visualizing, analyzing and acting on key performance metrics. On top of that we’re enabling you to easily track your own operational metrics. In this blog post we’ll look at how the pieces tie together.

    Librato integration

    The best part of today’s release is our new integration with Librato for monitoring and analyzing metrics. Librato is an awesome and incredibly useful service that enables you to easily visualize and correlate metrics, including the new log-based performance metrics provided by AppHarbor (described in more details below).

    Librato Dashboard

    Librato is now available as an add-on and integrates seamlessly with your AppHarbor logs. When you provision the add-on, Librato will setup a preconfigured dashboard tailored for displaying AppHarbor performance data and you can access it immediately by going to the Librato admin page. Everything will work out of the box without any further configuration and your logs will automatically be sent to Librato using a log drain.

    When log messages containing metric data are sent to Librato they’re transformed by an l2met service before being sent to their regular API. A very cool feature of the l2met service is that it can automatically calculate some useful metrics. For instance, it’ll calculate the median response time as well as the the 99th and 95th percentile of measurements such as response times. The perc99 response time means the response time of the 99% fastest responses. It can be useful to know this value since it's less affected by a few very slow responses than the average. Among other things this provides a good measurement of the browsing experience for most of your users.

    Librato Dashboard

    The l2met project was started by Ryan Smith - a big shout-out and thanks to him and the Librato team for developing this great tool.

    For more information about how to integrate with Librato and details about the service please refer to the documentation here. Also check out their announcement blog post about the integration.

    Built-in performance metrics

    AppHarbor can now write key runtime performance metrics directly to your application’s log stream as l2met 2.0 formatted messages similar to this:

    source=web.5 sample#memory.private_bytes=701091840
    source=web.5 sample#process.handles=2597
    source=web.5 sample#cpu.load_average=1.97
    

    These are the messages Librato uses as well and most of them are written every 20 seconds. They allow for real-time monitoring of worker-specific runtime metrics such as CPU (load average) and memory usage, as well as measurements of response time and size reported from the load balancers. Because these metrics are logged to your log stream you can also consume them in the same way you’d usually view or integrate with your logs.

    Load average run-time metrics

    Performance data collection takes place completely out-of-process, without using a profiler, and it can be enabled and disabled without redeploying the application. This means that monitoring won’t impact application performance at all and that a profiler (such as New Relic) can still be attached to the application.

    Writing custom metrics

    The performance data provided by AppHarbor is probably not the only metrics you want to track. You can of course integrate directly with Librato’s API, but the l2met integration makes it easier than ever to track your own metrics, and the paid Librato plans includes the ability to track custom metrics exactly for that purpose.

    You can start writing your own metrics simply by sending an l2met-formatted string to your logs. Last week we introduced the Trace Logging feature which is perfect for this, so writing your custom metrics can now be done with a simple trace:

    Trace.TraceInformation(“measure#twitter.lookup.time=433”);
    

    To make this even easier we’ve built the metric-reporter library (a .NET port of Librato’s log-reporter) to provide an easy to use interface for writing metrics to your log stream. You can install it with NuGet:

    Install-Package MetricReporter
    

    Then initialize a MetricReporter which writes to a text writer:

    var writer = new L2MetWriter(new TraceTextWriter);
    var reporter = new MetricReporter(metricWriter);
    

    And start tracking your own custom metrics:

    reporter.Increment("jobs.completed");
    reporter.Measure("payload.size", 21276);
    reporter.Measure("twitter.lookup.time", () =>
    {
        //Do work
        twitterRequest.GetResponse();
    });
    

    On Librato you can then view charts with these new metrics along with the performance metrics provided by AppHarbor, and add them to your dashboards, aggregate and correlate data, set up alerts etc. The MetricReporter library will take care of writing l2met-formatted metrics using the appropriate metric types and write to the trace or another IO stream. Make sure to inspect the README for more examples and information on configuration and usage.

    That’s all we have for today. There’ll be more examples on how you can use these new features soon, but for now we encourage you to take it for a spin, install the Librato add-on and test the waters for yourself. We’d love to hear what you think so if there are other metrics you’d like to see or if you experience any issues please hit us up through the usual channels.

  • Introducing Trace Logging (AppHarbor Blog)

    Today we’re happy to introduce trace message integration with your application log. With tracing you can very easily log trace messages to your application's log stream by using the built-in tracing capabilities of the .NET framework from anywhere in your application.

    When introducing the realtime logging module a while back we opened up access to collated log data from load balancers, the build and deploy infrastructure, background workers and more. Notably missing however was the ability to log from web workers. We’re closing that gap with tracing, which can be used in both background and web workers.

    How to use it

    The trace feature integrates with standard .NET tracing, so you don’t have to make any changes to your application to use it. You can simply log traces from your workers with the System.Diagnostics.Trace class:

    Trace.TraceInformation("Hello world");
    

    This will yield a log message containing a timestamp and the source of the trace in your application’s log like so:

    2014-01-22T06:46:48.086+00:00 app web.1 Hello World
    

    You can also use a TraceSource by specifying the trace source name AppHarborTraceSource:

    var traceSource = new TraceSource("AppHarborTraceSource", defaultLevel: SourceLevels.All);
    traceSource.TraceEvent(TraceEventType.Critical, 0, "Foo");
    

    You may not always want noisy trace messages in your logs and you can configure the trace level on the "Logging" page. There are 4 levels: All, Warning, Error and None. Setting the trace level will update the configuration without redeploying or restarting the application. This is often desirable if you need to turn on tracing when debugging and diagnosing an ongoing or state-related issue.

    Configure Trace level

    There are a number of other ways to use the new tracing feature including:

    • ASP.NET health monitoring (for logging exceptions, application lifecycle events etc).
    • A logging library such as NLog (Trace target) or log4net (TraceAppender).
    • Integrating with ETW (Event Tracing for Windows) directly using the injected event provider id.

    Anything that integrates with .NET tracing or ETW should work, and you can find more details and examples in this knowledge base article.

    All new applications have tracing enabled by default. Tracing can be enabled for existing applications on the "Logging" page.

    How does it work

    Under the hood we’re using ETW for delivering log messages to the components that are responsible for sending traces to your log stream. Application performance is unaffected by the delivery of log messages as this takes place completely out of process. Note however that messages are buffered for about a second and that some messages may be dropped if you’re writing excessively to the trace output.

    When tracing is enabled, AppHarbor configures your application with an EventProviderTraceListener as a default trace listener. While you can integrate directly with ETW as well we recommend using the Trace or TraceSource approaches described above.

    Viewing trace messages

    Traces are collated with other logging sources in your log stream, so you can consume them in the same way you’re used to. You can view log messages using the command line interface, the web viewer or set up a log drain to any HTTP, HTTPS or syslog endpoint. For more information about the various integration points please refer to this article.

    Viewing trace messages in console

    We’ve got a couple of cool features that builds on this ready soon, so stay tuned and happy tracing!

  • .NET 4.5.1 is ready (AppHarbor Blog)

    Microsoft released .NET 4.5.1 a while back, bringing a bunch of performance improvements and new features to the framework. Check out the announcement for the details.

    Over the past few weeks we have updated our build infrastructure and application servers to support this release. We're happy to report that AppHarbor now supports building, testing and running applications targeting the .NET 4.5.1 framework, as well as solutions created with Visual Studio 2013 and ASP.NET MVC 5 applications.

    There are no known issues related to this release. If you encounter problems, please refer to the usual support channels and we'll help you out.

    .NET logo

  • Integrated NuGet Package Restore (AppHarbor Blog)

    A few months ago the NuGet team released NuGet 2.7, which introduced a new approach to package restore. We recently updated the AppHarbor build process to adopt this approach and integrate the new NuGet restore command. AppHarbor will now automatically invoke package restore before building your solution.

    Automatically restoring packages is a recommended practice, especially because you don’t have to commit the packages to your repository and can keep the footprint small. Until now we’ve recommended using the approach desribed in this blog post to restore NuGet packages when building your application. This has worked relatively well, but it’s also a bit of a hack and has a few caveats:

    • Some NuGet packages rely files that needs to be present and imported when MSBuild is invoked. This has most notably been an issue for applications relying on the Microsoft.Bcl.Build package for the reasons outlined in this article.
    • NuGet.exe has to be committed and maintained with the repository and project and solution files needs to be configured.
    • Package restore can intermittently fail in some cases when multiple projects are built concurrently.

    With this release we expect to eliminate these issues and provide a more stable, efficient and streamlined way of handling package restore.

    If necessary, NuGet can be configured by adding a NuGet.config file in the same directory as your solution file (or alternatively in a .nuget folder under your solution directory). You usually don't have to configure anything if you’re only using the official NuGet feed, but you’ll need to configure your application if it relies on other package sources. You can find an example configuration file which adds a private package source in the knowledge base article about package restore and further documentation for NuGet configuration files can be found here.

    If you hit any snags we’re always happy to help on our support forums.

    NuGet logo

  • New Relic Improves Service and Reduces Price (AppHarbor Blog)

    New Relic

    We're happy to announce that New Relic has dropped the price of the Professional add-on plan from $45/month to $19/month per worker unit. Over the years New Relic has proven to be a really useful tool for many of our customers, and we're pleased that this price drop will make the features of New Relic Professional more accessible to everyone using AppHarbor.

    Highlights of the Professional plan include:

    • Unlimited data retention
    • Real User Monitoring (RUM) and browser transaction tracing
    • Application transaction tracing, including Key Transactions and Cross Application Tracing
    • Advanced SQL and slow SQL analysis

    You can find more information about the benefits of New Relic Pro on the New Relic website (http://newrelic.com/pricing/details).

    Service update

    The New Relic agent was recently upgraded to a newer version which brings support for some recently introduced features as well as a bunch of bug fixes. Time spent in the request queue is now reported and exposed directly in the New Relic interface. Requests are rarely queued for longer than a few milliseconds, but it can happen if your workers are under load. When more time is spent in the request queue it may be an indicator that you need to scale your application to handle the load efficiently.

    We're also making a few changes to the way the New Relic profiler is initialized with your applications. This is particularly relevant if you've subscribed to New Relic directly rather than installing the add-on through AppHarbor. Going forward you'll need to add a NewRelic.LicenseKey configuration variable to make sure the profiler is attached to your application. We recommend that you make this change as soon as possible. If you're subscribed to the add-on through AppHarbor no action is required and the service will continue to work as usual.

  • Found Elasticsearch add-on available (AppHarbor Blog)

    Found ElasticSearch

    Found provides fully hosted and managed Elasticsearch clusters; each cluster has reserved memory and storage ensuring predictable performance. The HTTPS API is developer-friendly and existing Elasticsearch libraries such as NEST, Tire, PyES and others work out of the box. The Elasticsearch API is unmodified, so for those with an existing Elasticsearch integration it is easy to get started.

    For production and mission critical environments customers can opt for replication and automatic failover to a secondary site, protecting the cluster against unplanned downtime. Security has a strong focus: communication to and from the service is securely transmitted over HTTPS (SSL) and data is stored behind multiple firewalls and proxies. Clusters run in isolated containers (LXC) and customisable ACLs allow for restricting access to trusted people and hosts.

    In the event of a datacenter failure, search clusters are automatically failed over to a working datacenter or, in case of a catastrophic event, completely rebuilt from backup.

    Co-founder Alex Brasetvik says: "Found provides a solution for companies who are keen to use Elasticsearch but not overly keen to spend their time and money on herding servers! We provide our customers with complete cluster control: they can scale their clusters up or down at any time, according to their immediate needs. It's effortless and there's zero downtime."

    More information and price plans are available on the add-on page.

  • Introducing Realtime Logging (AppHarbor Blog)

    Today we're incredibly excited to announce the public beta of our brand new logging module. Starting immediately all new applications created on AppHarbor will have logging enabled. You can enable it for your existing apps on the new "Logging" page.

    We know all too well that running applications on a PaaS like AppHarbor sometimes can feel like a black box. So far we haven't had a unified, simple and efficient way to collate, present and distribute log events from the platform and your apps.

    That's exactly what we wanted to address with our logging solution, and based on the amazing feedback from private beta users we feel confident that you'll find it useful for getting insight about your application and AppHarbor. A big thanks to all the beta testers who have helped us refine and test these new features.

    The new logging module collates log messages from multiple sources, including almost all AppHarbor infrastructure component and your applications - API changes, load balancer request logs, build, deploy and stdout/stderr from your background workers and more can now be accessed and sent to external services in real time.

    Captain's log Consider yourself lucky we're not that much into skeuomorphism

    Interfaces

    We're providing two interfaces "out of the box" - a convenient web-interface can be accessed on the Logging page and a new log command has been added to the CLI. [Get the installer directly from here or install with Chocolatey cinst appharborcli.install. To start a "tailing" log session with the CLI, you can for instance run appharbor log -t -s appharbor. Type appharbor log -h to see all options. Log web interface

    The web interface works a bit differently, but try it out and let us know what you think - it's heavily inspired by the log.io project who have built a great client side interface for viewing, filtering, searching and splitting logs into multiple "screens".

    Log web interface

    Integration

    One of the most useful and interesting aspects of today's release is the flexible integration points it provides. Providing access to your logs in realtime is one thing, but AppHarbor will only store the last 1500 log messages for your application. Storing, searching, viewing and indexing logs can be fairly complex and luckily many services already exists that helps you make more sense of your log data.

    We've worked with Logentries to provide a completely automated and convenient way for sending AppHarbor logs to them when you add their add-on. When you add the Logentries add-on your application can automatically be configured to send logs to Logentries, and Logentries will be configured to display log messages in AppHarbor's format.

    Logentries integration

    You can also configure any syslog (TCP), HTTP and HTTPS endpoint you like with log "drains". You can use this to integrate with services like Loggly and Splunk, or even your own syslog server or HTTP service. More details about log drains are available in the this knowledge base article and the drain API documentation.

    Finally there's a new new Log session API endpoint that you can use to create sessions similar to the ones used by the interfaces we provide.

    Logplex

    If you've ever used Heroku you'll find most of these features very familiar. That's no coincidence - the backend is based on Heroku's awesome distributed syslog router, Logplex. Integrating with Logplex makes it a lot easier for add-on providers who already support Heroku's Logplex to integrate with AppHarbor, while giving us a scalable and proven logging backend to support thousands of deployed apps.

    Logplex is also in rapid, active development, and a big shout-out to the awesome people at Heroku who are building this incredibly elegant solution. If you're interested in learning more about Logplex we encourage you to check out the project on Github and try it for yourself. We've built a client library for interacting with Logplex's HTTP API and HTTP log endpoints from .NET apps - let us know if you'd like to use this and we'll be happy to open source the code. The Logplex documentation on stream management is also useful for a high-level overview of how Logplex works.

    Next steps

    With this release we've greatly improved the logging experience for our customers. We're releasing this public beta since we know it'll be useful to many of you as it is, but we're by no means finished. We want to add even more log sources, provide more information from the various infrastructure components and integrate with more add-on providers. Also note that request logs are currently only available on shared load balancers, but it will be rolled out to all load balancers soon. If you find yourself wanting some log data that is not currently available please let us know. We now have a solid foundation to provide you with the information you need when you need it, and we couldn't be more excited about that.

    We'll provide you with some examples and more documentation for these new features over the next couple of weeks, but for now we hope you'll take it for a spin and test the waters for yourself. Have fun!

  • Introducing PageSpeed optimizations (AppHarbor Blog)

    Today we've introducing a new experimental feature: Google PageSpeed optimizations support. The PageSpeed module is a suite of tools that tries to optimize web page latency and bandwidth usage of your websites by rewriting your content to implement web performance best practices. Reducing the number of requests to a single domain, optimizing cache policies and compressing content can significantly improve web performance and lead to a better user experience.

    With PageSpeed optimization filters we're making it easier to apply some of these best practices, and provide a solution that efficiently and effortlessly speed up your web apps. The optimizations takes place at the load balancer level and works for all web applications no matter what framework or language you use.

    As an example of how this works you can inspect the HTML and resources of this blog to see some of the optimizations that are applied. Analyzing blog.appharbor.com with the online PageSpeed insights tool yields a "PageSpeed score" of 88 when enabled versus 73 when disabled. Not too bad considering it only took a click to enable it.

    PageSpeed button

    You can enable PageSpeed optimizations for your web application on the new "Labs" page, which can be found in the application navigation bar. The application will be configured with PageSpeed's core set of filters within a few seconds. We will then, among other things, apply these filters to your content:

    When you've enabled PageSpeed we recommend that you test the application to make sure it doesn't break anything. You can also inspect the returned content in your browser and if you hit any snags simply disable PageSpeed and let support know about it. Note that only content transferred over HTTP from your domain will be processed by PageSpeed filters. To optimize HTTPS traffic you can enable SPDY support (although that is currently only enabled on dedicated load balancers and in the beta region).

    We'll make more filters available later on, but for the beta we're starting out with a curated set of core filters, which are considered safe for most web applications. There are a few other cool filters we'll add support for later on - such as automatic sprite image generation and lazy-loading of images. Let us know if there are any filters in the catalog you think we should support!

  • Announcing the Sydney, Australia Region for Heroku Private Spaces (Heroku)
    30 Jan 2017 19:42

    Today we’re happy to announce that the Sydney, Australia region is now generally available for use with Heroku Private Spaces. Sydney joins Virginia, Oregon, Frankfurt, and Tokyo as regions where Private Spaces can be created by any Heroku Enterprise user. Developers can now deploy Heroku apps closer to customers in the Asia-Pacific area to reduce latency and take advantage of the advanced network & trust controls of Spaces to ensure sensitive data stays protected.

    Usage

    To create a Private Space in Sydney, select the Spaces tab in Heroku Dashboard in Heroku Enterprise, then click the “New Space” button and choose “Sydney, Australia” from the the Space Region dropdown.

    After a Private Space in Sydney is created, Heroku apps can be created inside it as normal. Heroku Postgres, Redis, and Kafka are also available in Sydney as are a variety of third-party Add-ons.

    Better Latency for Asia-Pacific

    Prior to this release, a Heroku Enterprise developer, from anywhere in the world, could create apps in Spaces in Virginia, Oregon, Tokyo, or Frankfurt, and have them be available to any user in the world. The difference with this release is that apps (and Heroku data services) can be created and hosted in the Sydney region. This will bring far faster access for developers and users of Heroku apps across the Asia-Pacific area. Time-To-First-Byte for a user in Australia accessing an app deployed in a Private Space in the Sydney region is approximately four times better than for that same user accessing an app deployed in a Private Space in Tokyo (approx 0.1s vs 0.4s).

    Extending the Vision of Heroku Private Spaces

    A Private Space, available as part of Heroku Enterprise, is a network isolated group of apps and data services with a dedicated runtime environment, provisioned to Heroku in a geographic region you specify. With Spaces you can build modern apps with the powerful Heroku developer experience and get enterprise-grade secure network topologies. This enables your Heroku applications to securely connect to on-premise systems on your corporate network and other cloud services, including Salesforce.

    With the GA of the Sydney Region, we now bring those isolation, security, and network benefits to Heroku apps and data services in the Asia-Pacific region.

    Learn More

    All Heroku Enterprise customers can immediately begin to create Private Spaces in Sydney and deploy apps there. We’re excited by the possibilities Private Spaces opens up for developers in Australia and Asia-Pacific more broadly - if you want more information, or are an existing Heroku customer and have questions on using and configuring Spaces, please contact us.

  • Announcing Heroku Autoscaling for Web Dynos (Heroku)
    24 Jan 2017 16:42

    We’re excited to announce that Heroku Autoscaling is now generally available for apps using web dynos.

    We’ve always made it seamless and simple to scale apps on Heroku - just move the slider. But we want to go further, and help you in the face of unexpected demand spikes or intermittent activity. Part of our core mission is delivering a first-class operational experience that provides proactive notifications, guidance, and—where appropriate—automated responses to particular application events. Today we take another big step forward in that mission with the introduction of Autoscaling.

    Autoscaling makes it effortless to meet demand by horizontally scaling your web dynos based on what’s most important to your end users: responsiveness. To measure responsiveness, Heroku Autoscaling uses your app’s 95th percentile (p95) response time, an industry-standard metric for assessing user experience. The p95 response time is the number of milliseconds that only 5% of your app’s response times exceed. You can view your app’s p95 response time in the Application Metrics response time plot. Using p95 response time as the trigger for autoscaling ensures that the vast majority of your users experience good performance, without overreacting to performance outliers.

    Autoscaling is easy to set up and use, and it recommends a p95 threshold based on your app’s past 24 hours of response times. Response-based autoscaling ensures that your web dyno formation is always sized for optimal efficiency, while capping your costs based on limits you set. Autoscaling is currently included at no additional cost for apps using Performance and Private web dynos.

    autoscaling_demo

    Get Started

    From Heroku Dashboard navigate to the Resources tab to enable autoscaling for your web dynos: enable_as

    From the web dyno formation dialog set the desired upper and lower limit for your dyno range. With Heroku Autoscaling you won’t be surprised by unexpected dyno fees. The cost estimator shows the maximum possible web dyno cost when for using autoscaling, expressed in either dyno units for Heroku Enterprise organizations, or dollars.

    Next, enter the desired p95 response time in milliseconds. To make it easy to select a meaningful p95 setting, the median p95 latency for the past 24 hours is provided as guidance. By enabling Email Notifications we’ll let you know if the scaling demand reaches your maximum dyno setting, so you won’t miss a customer request.

    config0

    Monitoring Autoscaling

    You can monitor your autoscaling configuration and scaling events the Events table on Application Metrics and view the corresponding impact on application health.

    autoscalingblogevents

    When to Use Autoscaling

    Autoscaling is useful for when demand on web resources is variable. However, it is not meant to be a panacea for all application health issues that result in latency. For example, it is possible that lengthy response times may be due to a downstream resource, such as a slow database query. In this case scaling web dynos in the absence of sufficient database resources or query optimization could result in exacerbation of the problem.

    In order to identify whether autoscaling is appropriate for your environment we recommend that you load test prior to implementing autoscaling in production, and use Threshold Alerting to monitor your p95 response times and error rates. If you plan to load test please refer to our Load Testing Guidelines for Support notification requirements. As with manual scaling, you may need to tune your downstream components in anticipation of higher request volumes. Additional guidance on optimization is available in the Scaling documentation.

    How it Works

    Heroku's autoscaling model employs Little's Law to determine the optimal number of web dynos needed to maintain your current request throughput while keeping web request latency within your specified p95 response time threshold. The deficit or excess of dynos is measured as Ldiff, which takes into consideration the past hour of traffic. For example in the following simulation, at time point 80 minutes there is a spike in response time (latency) and a corresponding dip in Ldiff, indicating that there is a deficit in the existing number of web dynos with respect to the current throughput and response time target. The platform will add an additional web dyno and reassess the Ldiff. This process will be repeated until the p95 response time is within your specified limit or you have reached your specified upper dyno limit. A similar approach is used for scaling in.

    autoscalingsimga

    Find Out More

    Autoscaling has been one of your top requested features when it comes to operational experience - thank you to everyone who gave us feedback during the beta and in our recent ops survey. For more details on autoscaling refer to the Dyno Scaling documentation. Learn more about Heroku's other operational features here.

    If there’s an autoscaling enhancement or metrics-driven feature you would like to see, you can reach us at metrics-feedback@heroku.com.

  • The Heroku 2016 Retrospective (Heroku)
    03 Jan 2017 00:00

    As we begin 2017, we want to thank you for supporting Heroku. Your creativity and innovation continues to inspire us, and pushed us to deliver even more new products and features in 2016. We especially want to thank everyone who helped us by beta testing, sharing Heroku with others, and providing feedback. Here are the highlights of what became generally available in 2016.

    Advancing the Developer Experience

    Heroku Pipelines

    A new way to structure, manage and visualize continuous delivery.

    Heroku Review Apps

    Test code at a shareable URL using disposable Heroku apps that spin up with each GitHub pull request.

    Free SSL for Apps on Paid Dynos

    Get SSL encryption on custom domains for free on apps that use paid dynos.

    The New Heroku CLI

    Take advantage of the CLI’s faster performance and new usability features.

    Heroku Teams

    Powerful collaboration, administration and centralized billing capabilities to build and run more effective development teams.

    Flexible Dyno Hours

    Run a free app 24/7, or many apps on an occasional basis, using a pool of account-based free dyno hours.

    Threshold Alerting

    Let the platform keep your apps healthy: get proactive alerts based on app responsiveness and error rates.

    Session Affinity

    Route requests from a given browser to the same dyno, so apps with ‘sticky sessions’ can take advantage of Heroku’s flexible scaling.

    Build Data-Centric Apps on Heroku

    Apache Kafka on Heroku

    Build data-intensive apps with ease using the leading open source solution for managing event streams.

    PostgreSQL 9.6

    Speed up sequential scans for faster analytics applications, create indexes without blocking writes on tables in production apps, and more.

    Heroku External Objects

    Read and write Postgres data from Salesforce so you can integrate application data in Heroku with business processes inside Salesforce.

    Heroku Connect APIs

    Build repeatable automation for configuring Heroku Connect environments, managing connections across Salesforce orgs, and integrating with existing operational systems.

    Heroku Enterprise: Advanced Trust Controls & Scale for Large Organizations

    Heroku Private Spaces

    Have your own private Heroku as a service, with configurable network boundaries, global regions, and private data services for your most demanding enterprise apps.

    SSO for Heroku

    Use SAML 2.0 identity providers like Salesforce Identity, Ping and Okta for single sign-on to Heroku Enterprise.

    Add-on Controls

    Standardize the add-ons your team uses by whitelisting them within your Heroku Enterprise organization.

    Onwards!

    We look forward to continuing our innovation across developer experience, data services, collaboration, and enterprise controls to help you build more amazing applications. Have a product or feature you'd like to see in 2017? Send us your feedback.

    P.S. get your Heroku created ASCII artwork here and here.

  • Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding (Heroku)
    25 Dec 2016 19:00

    The Ruby maintainers continued their annual tradition by gifting us a new Ruby version to celebrate the holiday: Ruby 2.4 is now available and you can try it out on Heroku.

    Ruby 2.4 brings some impressive new features and performance improvements to the table, here are a few of the big ones:

    Binding#irb

    Have you ever used p or puts to get the value of a variable in your code? If you’ve been writing Ruby the odds are pretty good that you have. The alternative REPL Pry (http://pryrepl.org/) broke many of us of this habit, but installing a gem to get a REPL during runtime isn’t always an option, or at least not a convenient one.

    Enter binding.irb, a new native runtime invocation for the IRB REPL that ships with Ruby. Now you can simply add binding.irb to your code to open an IRB session and have a look around:

    # ruby-2.4.0
    class SuperConfusing
      def what_is_even_happening_right_now
        @x = @xy[:y] ** @x
    
        binding.irb
        # open a REPL here to examine @x, @xy,
        # and possibly your life choices
      end
    end
    

    One Integer to Rule Them All

    Ruby previously used 3 classes to handle integers: the abstract super class Integer, the Fixnum class for small integers and the Bignum class for large integers. You can see this behavior yourself in Ruby 2.3:

    # ruby-2.3.3
    irb> 1.class
    # => Fixnum
    irb> (2**100).class
    # => Bignum
    irb> Fixnum.superclass
    # => Integer
    irb> Bignum.superclass
    # => Integer
    

    Ruby 2.4 unifies the Fixnum and Bignum classes into a single concrete class Integer:

    # ruby-2.4.0
    irb> 1.class
    # => Integer
    irb> (2**100).class
    # => Integer
    

    Why Did We Ever Have Two Classes of Integer?

    To improve performance Ruby stores small numbers in a single native machine word whenever possible, either 32 or 64 bits in length depending on your processor. A 64-bit processor has a 64-bit word length; the 64 in this case describes the size of the registers on the processor.

    The registers allow the processor to handle simple arithmetic and logical comparisons, for numbers up to the word size, by itself; which is much faster than manipulating values stored in RAM.

    On my laptop it's more than twice as fast for me to add 1 to a Fixnum a million times than it is to do the same with a Bignum:

    # ruby-2.3.3
    require "benchmark"
    
    fixnum = 2**40
    bignum = 2**80
    
    n = 1_000_000
    
    Benchmark.bm do |x|
      x.report("Adding #{fixnum.class}:") { n.times { fixnum + 1 } }
      x.report("Adding #{bignum.class}:") { n.times { bignum + 1 } }
    end
    
    # =>
    #                     user     system      total        real
    # Adding Fixnum:  0.190000   0.010000   0.200000 (  0.189790)
    # Adding Bignum:  0.460000   0.000000   0.460000 (  0.471123)
    

    When a number is too big to fit in a native machine word Ruby will store that number differently, automatically converting it to a Bignum behind the scenes.

    How Big Is Too Big?

    Well, that depends. It depends on the processor you’re using, as we’ve discussed, but it also depends on the operating system and the Ruby implementation you’re using.

    Wait It Depends on My Operating System?

    Yes, different operating systems use different C data type models.

    When processors first started shipping with 64-bit registers it became necessary to augment the existing data types in the C language, to accommodate larger register sizes and take advantage of performance increases.

    Unfortunately, The C language doesn't provide a mechanism for adding new fundamental data types. These augmentations had to be accomplished via alternative data models like LP64, ILP64 and LLP64.

    LL-What Now?

    LP64, IL64 and LLP64 are some of the data models used in the C language. This is not an exhaustive list of the available C data models but these are the most common.

    The first few characters in each of these acronyms describe the data types they affect. For example, the "L" and "P" in the LP64 data model stand for long and pointer, because LP64 uses 64-bits for those data types.

    These are the sizes of the relevant data types for these common data models:

    |       | int | long | long long | pointer |
    |-------|-----|------|-----------|---------|
    | LP64  | 32  | 64   | NA        | 64      |
    | ILP64 | 64  | 64   | NA        | 64      |
    | LLP64 | 32  | 32   | 64        | 64      |
    

    Almost all UNIX and Linux implementations use LP64, including OS X. Windows uses LLP64, which includes a new long long type, just like long but longer.

    So the maximum size of a Fixnum depends on your processor and your operating system, in part. It also depends on your Ruby implementation.

    Fixnum Size by Ruby Implementation

    | Fixnum Range         | MIN             | MAX              |
    |----------------------|-----------------|------------------|
    | 32-bit CRuby (ILP32) | -2**30          | 2**30 - 1        |
    | 64-bit CRuby (LLP64) | -2**30          | 2**30 - 1        |
    | 64-bit CRuby (LP64)  | -2**62          | 2**62 - 1        |
    | JRuby                | -2**63          | 2**63 - 1        |
    

    The range of Fixnum can vary quite a bit between Ruby implementations.

    In JRuby for example a Fixnum is any number between -263 and 263-1. CRuby will either have Fixnum values between -230 and 230-1 or -262 and 262-1, depending on the underlying C data model.

    Your Numbers Are Wrong, You're Not Using All the Bits

    You're right, even though we have 64 bits available we're only using 62 of them in CRuby and 63 in JRuby. Both of these implementations use two's complement integers, binary values that use one of the bits to store the sign of the number. So that accounts for one of our missing bits, how about that other one?

    In addition to the sign bit, CRuby uses one of the bits as a FIXNUM_FLAG, to tell the interpreter whether or not a given word holds a Fixnum or a reference to a larger number. The sign bit and the flag bit are at opposite ends of the 64-bit word, and the 62 bits left in the middle are the space we have to store a number.

    In JRuby we have 63 bits to store our Fixnum, because JRuby stores both Fixnum and Bignum as 64-bit signed values; they don't need a FIXNUM_FLAG.

    Why Are They Changing It Now?

    The Ruby team feels that the difference between a Fixnum and a Bignum is ultimately an implementation detail, and not something that needs to be exposed as part of the language.

    Using the Fixnum and Bignum classes directly in your code can lead to inconsistent behavior, because the range of those values depends on so many things. They don't want to encourage you to depend on the ranges of these different Integer types, because it makes your code less portable.

    Unification also significantly simplifies Ruby for beginners. When you're teaching your friends Ruby you longer need to explain the finer points of 64-bit processor architecture.

    Rounding Changes

    In Ruby Float#round has always rounded floating point numbers up for decimal values greater than or equal to .5, and down for anything less, much as you learned to expect in your arithmetic classes.

    # ruby-2.3.3
    irb> (2.4).round
    # => 2
    irb> (2.5).round
    # => 3
    

    During the development of Ruby 2.4 there was a proposal to change this default rounding behavior to instead round to the nearest even number, a strategy known as half to even rounding, or Gaussian rounding (among many other names).

    # ruby-2.4.0-preview3
    irb> (2.4).round
    # => 2
    irb> (2.5).round
    # => 2
    irb> (3.5).round
    # => 4
    

    The half to even strategy would only have changed rounding behavior for tie-breaking; numbers that are exactly halfway (.5) would have been rounded down for even numbers, and up for odd numbers.

    Why Would Anyone Do That?

    The Gaussian rounding strategy is commonly used in statistical analysis and financial transactions, as the resulting values less significantly alter the average magnitude for large sample sets.

    As an example let's generate a large set of random values that all end in .5:

    # ruby-2.3.3
    irb> halves = Array.new(1000) { rand(1..1000) + 0.5 }
    # => [578.5...120.5] # 1000 random numbers between 1.5 and 1000.5
    

    Now we'll calculate the average after forcing our sum to be a float, to ensure we don't end up doing integer division:

    # ruby-2.3.3
    irb> average = halves.inject(:+).to_f / halves.size
    # => 510.675
    

    The actual average of all of our numbers is 510.675, so the ideal rounding strategy should give us a rounded average be as close to that number as possible.

    Let's see how close we get using the existing rounding strategy:

    # ruby-2.3.3
    irb> round_up_average = halves.map(&:round).inject(:+).to_f / halves.size
    # => 511.175
    irb> (average - round_up_average).abs
    # => 0.5
    

    We're off the average by 0.5 when we consistently round ties up, which makes intuitive sense. So let's see if we can get closer with Gaussian rounding:

    # ruby-2.3.3
    irb> rounded_halves = halves.map { |n| n.to_i.even? ? n.floor : n.ceil }
    # => [578...120]
    irb> gaussian_average = rounded_halves.inject(:+).to_f / halves.size
    # => 510.664
    irb> (average - gaussian_average).abs
    # => 0.011000000000024102
    

    It would appear we have a winner. Rounding ties to the nearest even number brings us more than 97% closer to our actual average. For larger sample sets we can expect the average from Gaussian rounding to be almost exactly the actual average.

    This is why Gaussian rounding is the recommended default rounding strategy in the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

    So Ruby Decided to Change It Because of IEEE 754?

    Not exactly, it actually came to light because Gaussian rounding is already the default strategy for the Kernel#sprintf method, and an astute user filed a bug on Ruby: "Rounding modes inconsistency between round versus sprintf".

    Here we can clearly see the difference in behavior between Kernel#sprintf and Float#round:

    # ruby 2.3.3
    irb(main):001:0> sprintf('%1.0f', 12.5)
    # => "12"
    irb(main):002:0> (12.5).round
    # => 13
    

    The inconsistency in this behavior prompted the proposed change, which actually made it into one of the Ruby 2.4 preview versions, ruby-2.4.0-preview3:

    # ruby 2.4.0-preview3
    irb(main):006:0> sprintf('%1.0f', 12.5)
    # => "12"
    irb(main):007:0> 12.5.round
    # => 12
    

    In ruby-2.4.0-preview3 rounding with either Kernel#sprintf or Float#round will give the same result.

    Ultimately Matz decided this fix should not alter the default behavior of Float#round when another user reported a bug in Rails: "Breaking change in how #round works".

    The Ruby team decided to compromise and add a new keyword argument to Float#round to allow us to set alternative rounding strategies ourselves:

    # ruby 2.4.0-rc1
    irb(main):001:0> (2.5).round
    # => 3
    irb(main):008:0> (2.5).round(half: :down)
    # => 2
    irb(main):009:0> (2.5).round(half: :even)
    # => 2
    

    The keyword argument :half can take either :down or :even and the default behavior is still to round up, just as it was before.

    Why preview Versions Are Not for Production

    Interestingly before the default rounding behavior was changed briefly for 2.4.0-preview3 there was an unusual Kernel#sprintf bug in 2.4.0-preview2:

    # ruby 2.4.0-preview2
    irb> numbers = (1..20).map { |n| n + 0.5 }
    # => => [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5]
    irb> numbers.map { |n| sprintf('%1.0f', n) }
    # => ["2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
    

    In this example Kernel#sprintf appears to be rounding numbers less than 12 up as though it was using the Float#round method's default behavior, which was still in place at this point.

    The preview releases before and after 2.4.0-preview2, both 2.4.0-preview1 and 2.4.0-preview3, show the expected sprintf behavior, consistent with ruby-2.3.3:

    # ruby 2.4.0-preview1
    irb> numbers.map { |n| sprintf('%1.0f', n) }
    # => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
    
    # ruby 2.4.0-preview3
    irb> numbers.map { |n| sprintf('%1.0f', n) }
    # => ["2", "2", "4", "4", "6", "6", "8", "8", "10", "10", "12", "12", "14", "14", "16", "16", "18", "18", "20", "20"]
    

    I discovered this by accident while researching this article and started digging through the 2.4.0-preview2 changes to see if I could identify the cause. I found this commit from Nobu:

    commit 295f60b94d5ff6551fab7c55e18d1ffa6a4cf7e3
    Author: nobu <nobu@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
    Date:   Sun Jul 10 05:27:27 2016 +0000
    
        util.c: round nearly middle value
    
        * util.c (ruby_dtoa): [EXPERIMENTAL] adjust the case that the
          Float value is close to the exact but unrepresentable middle
          value of two values in the given precision, as r55604.
    
        git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@55621 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
    

    Kernel#sprintf Accuracy in Ruby 2.4

    This was an early effort by Nobu to handle cases where floating point numbers rounded inconsistently with Kernel#sprintf in ruby-2.3.3 (and before):

    # ruby-2.3.3
    irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
    # => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
    irb> numbers.map { |n| sprintf("%.2f", n) }
    # => ["5.00", "5.01", "5.03", "5.04", "5.04", "5.05", "5.07", "5.08", "5.08", "5.09"]
    

    In the example above notice that 5.035 and 5.045 both round to 5.04. No matter what strategy Kernel#sprintf is using this is clearly unexpected. The cause turns out to be the unseen precision beyond our representations.

    Not to worry though, the final version of Nobu's fixes resolves this issue, and it will be available in Ruby 2.4.

    Kernel#sprintf will now consistently apply half to even rounding:

    # ruby-2.4.0-rc1
    irb> numbers = (0..9).map { |n| "5.0#{n}5".to_f }
    # => [5.005, 5.015, 5.025, 5.035, 5.045, 5.055, 5.065, 5.075, 5.085, 5.095]
    irb> numbers.map { |n| sprintf("%.2f", n) }
    # => ["5.00", "5.02", "5.02", "5.04", "5.04", "5.06", "5.06", "5.08", "5.08", "5.10"]
    

    Better Hashes

    Ruby 2.4 introduces some significant changes to the hash table backing Ruby's Hash object. These changes were prompted by Vladimir Makarov when he submitted a patch to Ruby's hash table earlier this year.

    If you have a couple of hours to spare that issue thread is an entertaining read, but on the off-chance you're one of those busy developers I'll go through the major points here. First we need to cover some Ruby Hash basics.

    If you're already an expert on Ruby hash internals feel free to skip ahead and read about the specific hash changes in Ruby 2.4.

    How Ruby Implements Hash

    Let's imagine for a moment that we have a severe case of "not invented here" syndrome, and we've decided to make our own Hash implementation in Ruby using arrays. I'm relatively certain we're about to do some groundbreaking computer science here so we'll call our new hash TurboHash, as it's certain to be faster than the original:

    # turbo_hash.rb
    class TurboHash
      attr_reader :table
    
      def initialize
        @table = []
      end
    end
    

    We'll use the @table array to store our table entries. We gave ourselves a reader to access it so it's easy to peek inside our hash.

    We're definitely going to need methods to set and retrieve elements from our revolutionary hash so let's get those in there:

    # turbo_hash.rb
    class TurboHash
      # ...
    
      def [](key)
        # remember our entries look like this:
        # [key, value]
    
        find(key).last
      end
    
      def find(key)
        # Enumerable#find here will return the first entry that makes
        # our block return true, otherwise it returns nil.
    
        @table.find do |entry|
          key == entry.first
        end
      end
    
      def []=(key, value)
        entry = find(key)
    
        if entry
          # If we already stored it just change the value
          entry[1] = value
        else
          # otherwise add a new entry
          @table << [key, value]
        end
      end
    end
    

    Excellent, we can set and retrieve keys. It's time to setup some benchmarking and admire our creation:

    require "benchmark"
    
    legacy = Hash.new
    turbo  = TurboHash.new
    
    n = 10_000
    
    def set_and_find(target)
      target = rand
    
      target[key] = rand
      target[key]
    end
    
    Benchmark.bm do |x|
      x.report("Hash: ") { n.times { set_and_find(legacy) } }
      x.report("TurboHash: ") { n.times { set_and_find(turbo) } }
    end
    
    #                  user     system      total        real
    # Hash:        0.010000   0.000000   0.010000 (  0.009026)
    # TurboHash:  45.450000   0.070000  45.520000 ( 45.573937)
    

    Well that could have gone better, our implementation is about 5000 times slower than Ruby's Hash. This is obviously not the way Hash is actually implemented.

    In order to find an element in @table our implementation traverses the entire array on each iteration; towards the end we're checking nearly 10k entries one at a time.

    So let's come up with something better. The iteration is killing us, if we can find a way to index instead of iterating we'll be way ahead.

    If we knew our keys were always going to be integers we could just store the values at their indexes inside of @table and look them up by their indexes later.

    The issue of course is that our keys can be anything, we're not building some cheap knock-off hash that can only take integers.

    We need a way to turn our keys into numbers in a consistent way, so "some_key" will give us the same number every time, and we can regenerate that number to find it again later.

    It turns out that the Object#hash is perfect for this purpose:

    irb> "some_key".hash
    # => 3031662902694417109
    irb> "some_other_key".hash
    # => -3752665667844152731
    
    irb> "some_key".hash
    # => 3031662902694417109
    

    The Object#hash will return unique(ish) integers for any object in Ruby, and you'll get the same number back every time you run it again with an object that's "equal" to the previous object.

    For example, every time you create a string in Ruby you'll get a unique object:

    irb> a = "some_key"
    # => "some_key"
    irb> a.object_id
    # => 70202008509060
    
    irb> b = "some_key"
    # => "some_key"
    irb> b.object_id
    # => 70202008471340
    

    These are clearly distinct objects, but they will have the same Object#hash return value because a == b:

    irb> a.hash
    # => 3031662902694417109
    irb> b.hash
    # => 3031662902694417109
    

    These hash return values are huge and sometimes negative, so we're going to use the remainder after dividing by some small number as our index instead:

    irb> a.hash % 11
    # => 8
    

    We can use this new number as the index in @table where we store the entry. When we want to look up an item later we can simply repeat the operation to know exactly where to find it.

    This raises another issue however, our new indexes are much less unique than they were originally; they range between 0 and 10. If we store more than 11 items we are certain to have collisions, overwriting existing entries.

    Rather than storing the entries directly in the table we'll put them inside arrays called "bins". Each bin will end up having multiple entries, but traversing the bins will still be faster than traversing the entire table.

    Armed with our new indexing system we can now make some improvements to our TurboHash.

    Our @table will hold a collection of bins and we'll store our entries in the bin that corresponds to key.hash % 11:

    # turbo_hash.rb
    class TurboHash
      NUM_BINS = 11
    
      attr_reader :table
    
      def initialize
        # We know our indexes will always be between 0 and 10
        # so we need an array of 11 bins.
        @table = Array.new(NUM_BINS) { [] }
      end
    
      def [](key)
        find(key).last
      end
    
      def find(key)
        # now we're searching inside the bins instead of the whole table
        bin_for(key).find do |entry|
          key == entry.first
        end
      end
    
      def bin_for(key)
        # since hash will always return the same thing we know right where to look
        @table[index_of(key)]
      end
    
      def index_of(key)
        # a pseudorandom number between 0 and 10
        key.hash % NUM_BINS
      end
    
      def []=(key, value)
        entry = find(key)
    
        if entry
          entry[1] = value
        else
          # store new entries in the bins
          bin_for(key) << [key, value]
        end
      end
    end
    

    Let's benchmark our new and improved implementation:

                     user     system      total        real
    Hash:        0.010000   0.000000   0.010000 (  0.012918)
    TurboHash:   3.800000   0.010000   3.810000 (  3.810126)
    

    So that's pretty good I guess, using bins decreased the time for TurboHash by more than 90%. Those sneaky Ruby maintainers are still crushing us though, let's see what else we can do.

    It occurs to me that our benchmark is creating 10_000 entries but we only have 11 bins. Each time we iterate through a bin we're actually going over a pretty large array now.

    Let's check out the sizes on those bins after the benchmark finishes:

    Bin:  Relative Size:          Length:
    ----------------------------------------
    0     +++++++++++++++++++     (904)
    1     ++++++++++++++++++++    (928)
    2     +++++++++++++++++++     (909)
    3     ++++++++++++++++++++    (915)
    4     +++++++++++++++++++     (881)
    5     +++++++++++++++++++     (886)
    6     +++++++++++++++++++     (876)
    7     ++++++++++++++++++++    (918)
    8     +++++++++++++++++++     (886)
    9     ++++++++++++++++++++    (952)
    10    ++++++++++++++++++++    (945)
    

    That's a nice even distribution of entries but those bins are huge. How much faster is TurboHash if we increase the number of bins to 19?

                     user     system      total        real
    Hash:        0.020000   0.000000   0.020000 (  0.021516)
    TurboHash:   2.870000   0.070000   2.940000 (  3.007853)
    
    Bin:  Relative Size:          Length:
    ----------------------------------------
    0     ++++++++++++++++++++++  (548)
    1     +++++++++++++++++++++   (522)
    2     ++++++++++++++++++++++  (547)
    3     +++++++++++++++++++++   (534)
    4     ++++++++++++++++++++    (501)
    5     +++++++++++++++++++++   (528)
    6     ++++++++++++++++++++    (497)
    7     +++++++++++++++++++++   (543)
    8     +++++++++++++++++++     (493)
    9     ++++++++++++++++++++    (500)
    10    +++++++++++++++++++++   (526)
    11    ++++++++++++++++++++++  (545)
    12    +++++++++++++++++++++   (529)
    13    ++++++++++++++++++++    (514)
    14    ++++++++++++++++++++++  (545)
    15    ++++++++++++++++++++++  (548)
    16    +++++++++++++++++++++   (543)
    17    ++++++++++++++++++++    (495)
    18    +++++++++++++++++++++   (542)
    

    We gained another 25%! That's pretty good, I bet it gets even better if we keep making the bins smaller. This is a process called rehashing, and it's a pretty important part of a good hashing strategy.

    Let's cheat and peek inside st.c to see how Ruby handles increasing the table size to accomodate more bins:

    /* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L38 */
    
    #define ST_DEFAULT_MAX_DENSITY 5
    #define ST_DEFAULT_INIT_TABLE_SIZE 16
    

    Ruby's hash table starts with 16 bins. How do they get away with 16 bins? Weren't we using prime numbers to reduce collisions?

    We were, but using prime numbers for hash table size is really just a defense against bad hashing functions. Ruby has a much better hashing function today than it once did, so the Ruby maintainers stopped using prime numbers in Ruby 2.2.0.

    What's This Other Default Max Density Number?

    The ST_DEFAULT_MAX_DENSITY defines the average maximum number of entries Ruby will allow in each bin before rehashing: choosing the next largest power of two and recreating the hash table with the new, larger size.

    You can see the conditional that checks for this in the add_direct function from st.c:

    /* https://github.com/ruby/ruby/blob/ruby_2_3/st.c#L463 */
    
    if (table->num_entries > ST_DEFAULT_MAX_DENSITY * table->num_bins) {...}
    

    Ruby's hash table tracks the number of entries as they're added using the num_entries value on table. This way Ruby doesn't need to count the entries to decide if it's time to rehash, it just checks to see if the number of entries is more than 5 times the number of bins.

    Let's implement some of the improvements we stole from Ruby to see if we can speed up TurboHash:

    class TurboHash
      STARTING_BINS = 16
    
      attr_accessor :table
    
      def initialize
        @max_density = 5
        @entry_count = 0
        @bin_count   = STARTING_BINS
        @table       = Array.new(@bin_count) { [] }
      end
    
      def grow
        # use bit shifting to get the next power of two and reset the table size
        @bin_count = @bin_count << 1
    
        # create a new table with a much larger number of bins
        new_table = Array.new(@bin_count) { [] }
    
        # copy each of the existing entries into the new table at their new location,
        # as returned by index_of(key)
        @table.flatten(1).each do |entry|
          new_table[index_of(entry.first)] << entry
        end
    
        # Finally we overwrite the existing table with our new, larger table
        @table = new_table
      end
    
      def full?
        # our bins are full when the number of entries surpasses 5 times the number of bins
        @entry_count > @max_density * @bin_count
      end
    
      def [](key)
        find(key).last
      end
    
      def find(key)
        bin_for(key).find do |entry|
          key == entry.first
        end
      end
    
      def bin_for(key)
        @table[index_of(key)]
      end
    
      def index_of(key)
        # use @bin_count because it now changes each time we resize the table
        key.hash % @bin_count
      end
    
      def []=(key, value)
        entry = find(key)
    
        if entry
          entry[1] = value
        else
          # grow the table whenever we run out of space
          grow if full?
    
          bin_for(key) << [key, value]
          @entry_count += 1
        end
      end
    end
    

    So what's the verdict?

                      user     system      total        real
    Hash:        0.010000   0.000000   0.010000 (  0.012012)
    TurboHash:   0.130000   0.010000   0.140000 (  0.133795)
    

    We lose. Even though our TurboHash is now 95% faster than our last version, Ruby still beats us by an order of magnitude.

    All things considered, I think TurboHash fared pretty well. I'm sure there are some ways we could further improve this implementation but it's time to move on.

    At long last we have enough background to explain what exactly is about to nearly double the speed of Ruby hashes.

    What Actually Changed

    Speed! Ruby 2.4 hashes are significantly faster. The changes introduced by Vladimir Makarov were designed to take advantage of modern processor caching improvements by focusing on data locality.

    This implementation speeds up the Ruby hash table benchmarks in average by more 40% on Intel Haswell CPU.

    https://github.com/ruby/ruby/blob/trunk/st.c#L93

    Oh Good! What?

    Processors like the Intel Haswell series use several levels of caching to speed up operations that reference the same region of memory.

    When the processor reads a value from memory it doesn't just take the value it needs; it grabs a large piece of memory nearby, operating on the assumption that it is likely going to be asked for some of that data in the near future.

    The exact algorithms processors use to determine which bits of memory should get loaded into each cache are somewhat difficult to discover. Manufacturers consider these strategies to be trade secrets.

    What is clear is that accessing any of the levels of caching is significantly faster than going all the way out to pokey old RAM to get information.

    How Much Faster?

    Real numbers here are almost meaningless to discuss because they depend on so many factors within a given system, but generally speaking we can say that L1 cache hits (the fastest level of caching) could speed up memory access by two orders of magnitude or more.

    An L1 cache hit can complete in half a nanosecond. For reference consider that a photon can only travel half a foot in that amount of time. Fetching from main memory will generally take at least 100 nanoseconds.

    Got It, Fast... Therefore Data Locality?

    Exactly. If we can ensure that the data Ruby accesses frequently is stored close together in main memory, we significantly increase our chances of winning a coveted spot in one of the caching levels.

    One of the ways to accomplish this is to decrease the overall size of the entries themselves. The smaller the entries are, the more likely they are to end up in the same caching level.

    In our TurboHash implementation above our entries were stored as simple arrays, but in ruby-2.3.3 table entries were actually stored in a linked list. Each of the entries contained a next pointer that pointed to the next entry in the list. If we can find a way to get by without that pointer and make the entries smaller we will take better advantage of the processor's built-in caching.

    The new approach in ruby.2.4.0-rc1 actually goes even further than just removing the next pointer, it removes the entries themselves. Instead we store the entries in a separate array, the "entries array", and we record the indexes for those entries in the bins array, referenced by their keys.

    This approach is known as "open addressing".

    Open Addressing

    Ruby has historically used "closed addressing" in its hash table, also known as "open hashing". The new alternative approach proposed by Vladimir Makarov uses "open addressing", also known as "closed hashing". I get that naming things is hard, but this can really get pretty confusing. For the rest of this discussion, I will only use open addressing to refer to the new implementation, and closed addressing to refer to the former.

    The reason open addressing is considered open is that it frees us from the hash table. The table entries themselves are not stored directly in the bins anymore, as with a closed addressing hash table, but rather in a separate entries array, ordered by insertion.

    Open addressing uses the bins array to map keys to their index in the entries array.

    Let's set a value in an example hash that uses open addressing:

    # ruby-2.4.0-rc1
    irb> my_hash["some_key"] = "some_value"
    

    When we set "some_key" in an open addressing hash table Ruby will use the hash of the key to determine where our new key-index reference should live in the bins array:

    irb> "some_key".hash
    # => -3336246172874487271
    

    Ruby first appends the new entry to the entries array, noting the index where it was stored. Ruby then uses the hash above to determine where in the bins array to store the key, referencing that index.

    Remember that the entry itself is not stored in the bins array, the key only references the index of the entry in the entries array.

    Determining the Bin

    The lower bits of the key's hash itself are used to determine where it goes in the bins array.

    Because we're not using all of the available information from the key's hash this process is "lossy", and it increases the chances of a later hash collision when we go to find a bin for our key.

    However, the cost of potential collisions is offset by the fact that choosing a bin this way is significantly faster.

    In the past, Ruby has used prime numbers to determine the size of the bins array. This approach gave some additional assurance that a hashing algorithm which didn't return evenly distributed hashes would not cause a single bin to become unbalanced in size.

    The bin size was used to mod the computed hash, and because the bin size was prime, it decreased the risk of hash collisions as it was unlikely to be a common factor of both computed hashes.

    Since version 2.2.0 Ruby has used bin array sizes that correspond to powers of two (16, 32, 64, 128, etc.). When we know the bin size is going to be a factor of two we're able to use the lower two bits to calculate a bin index, so we find out where to store our entry reference much more quickly.

    What's Wrong with Prime Modulo Mapping?

    Dividing big numbers by primes is slow. Dividing a 64-bit number (a hash) by a prime can take more than 100 CPU cycles for each iteration, which is even slower than accessing main memory.

    Even though the new approach may produce more hash collisions, it will ultimately improve performance, because collisions will probe the available bins linearly.

    Linear Probing

    The open addressing strategy in Ruby 2.4 uses a "full cycle linear congruential generator".

    This is just a function that generates pseudorandom numbers based on a seed, much like Ruby's Rand#rand method.

    Given the same seed the Rand#rand method will generate the same sequence of numbers, even if we create a new instance:

    irb> r = Random.new(7)
    # => #<Random:0x007fee63030d50>
    irb> r.rand(1..100)
    # => 48
    irb> r.rand(1..100)
    # => 69
    irb> r.rand(1..100)
    # => 26
    
    irb> r = Random.new(7)
    # => #<Random:0x007fee630ca928>
    irb> r.rand(1..100)
    # => 48
    irb> r.rand(1..100)
    # => 69
    irb> r.rand(1..100)
    # => 26
    
    # Note that these values will be distinct for separate Ruby processes.
    # If you run this same code on your machine you can expect to get different numbers.
    

    Similarly a linear congruential generator will generate the same numbers in sequence if we give it the same starting values.

    Linear Congruential Generator (LCG)

    This is the algorithm for a linear congruential generator:

    Xn+1 = (a * Xn + c ) % m

    For carefully chosen values of a, c, m and initial seed X0 the values of the sequence X will be pseudorandom.

    Here are the rules for choosing these values:

    • m must be greater than 0 (m > 0)
    • a must be greater than 0 and less than m (0 < a < m)
    • c must be greater than or equal to 0 and less than m (0 <= c < m)
    • X0 must be greater than or equal to 0 and less than m (0 <= X0 < m)

    Implemented in Ruby the LCG algorithm looks like this:

    irb> a, x_n, c, m = [5, 7, 3, 16]
    # => [5, 7, 3, 16]
    
    irb> x_n = (a * x_n + c) % m
    # => 6
    irb> x_n = (a * x_n + c) % m
    # => 1
    irb> x_n = (a * x_n + c) % m
    # => 8
    

    For the values chosen above that sequence will always return 6, 1 and 8, in that order. Because I've chosen the initial values with some additional constraints, the sequence will also choose every available number before it comes back around to 6.

    An LCG that returns each number before returning any number twice is known as a "full cycle" LCG.

    Full Cycle Linear Congruential Generator

    For a given seed we describe an LCG as full cycle when it will traverse every available state before returning to the seed state.

    So if we have an LCG that is capable of generating 16 pseudorandom numbers, it's a full cycle LCG if it will generate a sequence including each of those numbers before duplicating any of them.

    irb> (1..16).map { x_n = (a * x_n + c) % m }.sort
    # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
    

    These are the additional rules we must use when choosing our starting values to make an LCG full cycle:

    • c can't be 0 (c != 0)
    • m and c are relatively prime (the only positive integer that divides both of them is 1)
    • (a - 1) is divisible by all prime factors of m
    • (a - 1) is divisible by 4 if m is divisible by 4

    The first requirement makes our LCG into a "mixed congruential generator". Any LCG with a non-zero value for c is described as a mixed congruential generator, because it mixes multiplication and addition.

    If c is 0 we call the generator a "multiplicative" congruential generator (MCG), because it only uses multiplication. An MCG is also known as a Lehmer Random Number Generator (LRNG).

    The last 3 requirements in the list up above make a mixed cycle congruential generator into a full cycle LCG. Those 3 rules by themselves are called the Hull-Dobell Theorem.

    Hull-Dobell Theorem

    The Hull-Dobell Theorem describes a mixed congruential generator with a full period (one that generates all values before repeating).

    In Ruby 2.4 Vladimir has implemented an LCG that satisfies the Hull-Dobell Theorem, so Ruby will traverse the entire collection of bins without duplication.

    Remember that the new hash table implementation uses the lower bits of a key's hash to find a bin for our key-index reference, a reference that maps the entry's key to its index in the entries table.

    If the first attempt to find a bin for a key results in a hash collision, future attempts will use a different means of calculating the hash.

    The unused bits from the original hash are used with the collision bin index to generate a new secondary hash, which is then used to find the next bin.

    When the first attempt results in a collision the bin searching function becomes a full cycle LCG, guaranteeing that we will eventually find a home for our reference in the bins array.

    Since this open addressing approach allows us to store the much smaller references to entries in the bins array, rather than the entirety of the entries themselves, we significantly decrease the memory required to store the bins array.

    The new smaller bins array then increases our chances of taking advantage of the processor caching levels, by keeping this frequently accessed data structure close together in memory. Vladimir improved the data locality of the Ruby hash table.

    So Ruby is Faster and Vladimir Is Smart?

    Yup! We now have significantly faster hashes in Ruby thanks to Vladimir and a whole host of other Ruby contributors. Please make sure you make a point of thanking the Ruby maintainers the next time you see one of them at a conference.

    Contributing to open source can be a grueling and thankless job. Most of the time contributors only hear from users when something is broken, and maintainers can sometimes forget that so many people are appreciating their hard work every day.

    Want to Make a Contribution Yourself?

    The best way to express your gratitude for Ruby is to make a contribution.

    There are all sorts of ways to get started contributing to Ruby, if you're interested in contributing to Ruby itself check out the Ruby Core community page.

    Another great way to contribute is by testing preview versions as they’re released, and reporting potential bugs on the Ruby issues tracker. Watch the Recent News page (RSS feed) to find out when new preview versions are available.

    If you don't have the time to contribute to Ruby directly consider making a donation to Ruby development:

    Is that everything new in Ruby 2.4?

    Not even close. There are many more interesting updates to be found in the Ruby 2.4 ChangeLog.

    Here are a few of my favorites that I didn't have time to cover:

    Thank you so much for reading, I hope you have a wonderful holiday.

    ":heart:" Jonan

  • Announcing the New Heroku CLI: Performance and Readability Enhancements (Heroku)
    15 Dec 2016 16:16

    Today we are announcing the newest version of the Heroku CLI. We know how much time you spend in the CLI as developers and how much pride you take in being able to get things done quickly. Our new CLI has big improvements in performance as well as enhanced readability for humans and machines.

    Tuned for Performance

    CLI response time is made up of two parts: the API response time and the performance of the CLI itself, and the latter is where we’ve made big improvements. While a typical Unix user should experience responses that are around half a second faster, the biggest gains are for Windows users, as the new CLI no longer has a Ruby wrapper.

    When we measured the time it takes for the info command in the old vs. new CLI, it decreases from 1690 to 1210 milliseconds in Unix, and 3409 to 944 milliseconds in Windows! Though individual results will vary, you should experience faster response times on average across all commands.

    Performance_windows

    Installing the New CLI

    You might have noticed some improvements over the last few months, but to get the fastest version you’ll need to uninstall and reinstall the CLI, because we’ve rewritten it in Node.js with new installers. The good news is that this should be the last manual update you’ll ever do for the Heroku CLI: the new CLI will auto-update in the future.

    The instructions to uninstall for Mac OS X users are to type the following:

    $ rm -rf /usr/local/heroku
    $ rm -rf ~/.heroku ~/.local/share/heroku ~/.config/heroku ~/.cache/heroku
    

    Then download and run the OS X installer.

    On Windows, to uninstall the Heroku CLI:

    1. Click Start > Control Panel > Programs > Programs and Features.
    2. Select Heroku CLI, and then click Uninstall.
    3. Delete the .config/heroku directory inside your home directory.

    Then download and run the Windows installer.

    For the last few of you who are still using our very old Ruby gem - now is a great time to upgrade to the full Heroku CLI experience. Please let us know if you run into any issues with installation as we’re here to help!

    Improved Readability for Humans and Machines

    The new CLI includes a number of user experience improvements that we’ve been rolling out over the past few months. Here are some of our favorites.

    grep-parseable Output

    We’ve learned that while you value human-readable output, you want grep-parseable output too. We’ve standardized the output format to make it possible to use grep.

    For example, let’s look at heroku regions. heroku regions at one point showed output like the following:

    heroku_regions_old

    While this shows all the information about the available regions, and is arguably more readable for humans as it groups the two regions underneath their respective headers, you lose the ability to use grep to filter the data. Here is a better way to display this information:

    heroku_regions_new

    Now you can use grep to filter just common runtime spaces:

    heroku_regions_grep

    Power Up with the jq Tool

    If you want even better tools to work with a richer CLI output, many commands support a --json flag. Use the powerful jq tool to query the results.

    heroku_regions_jq

    $ heroku

    We noticed that heroku was one of the top commands users run. We learned that many users were running it to get a holistic overview of their account. We re-ordered the output so it would be in an order that would make sense to you - and showing your starred apps first. We also added context that would give you a Dashboard-style view of the current state of those apps and how they fit into the bigger picture, including pipeline info, last release info, metrics, and errors. At the end of the output, we give guidance on where you might want to go next - such as viewing add-ons or perhaps apps in a particular org.

    heroku_dashboard

    Colors

    We’ve used color to help you quickly read command output. We’ve given some nouns in the CLI standard colors, so that you’ll easily spot them. In the example above you’ll notice that apps are purple, example commands are in blue, and the number of unread notifications is green. We typically specify errors and warning messages in yellow and red.

    We’ve tried to be mindful with color. Too many contrasting colors in the same place can quickly begin to compete for attention and reduce readability. We also make sure color is never the only way we communicate information.

    You can always disable color as a user, by adding --no-color or setting COLOR=false.

    Input Commands: Flags and Args

    Our new CLI makes greater use of flags over args. Flags provide greater clarity and readability, and give you confidence that you are running the command correctly.

    An old heroku fork command would look like this:

    $ heroku fork destapp -a sourceapp
    

    Which app is being forked and which app is the destination app? It’s not clear.

    The new heroku fork has required flags:

    $ heroku fork --from sourceapp --to destapp 
    

    The input flags specify the source and destination with --from and --to so that it’s very clear. You can specify these flags in any order, and still be sure that you will get the correct result.

    Looking to the future, flags will allow us to provide autocomplete in a much better fashion than args. This is because when the user types:

    $ heroku info --app <tab><tab>
    

    ...we know without question that the next thing to complete is an app name and not another flag or other type of argument.

    Learn More

    These are just some examples of the work we’ve been doing to standardize and improve the Heroku CLI user experience. You can read more in the Heroku Dev Center CLI article. We’ve also published a CLI style guide that documents our point of view on CLI design and provides a clear direction for designing delightful CLI plugins.

    As always, we love getting feedback from you so try out the new CLI and let us know your thoughts.

  • A Few Postgres Essentials (Heroku)
    08 Dec 2016 16:44

    Postgres is our favorite database—it’s reliable, powerful and secure. Here are a few essential tips learned from building, and helping our customers build, apps around Postgres. These tips will help ensure you get the most out of Postgres, whether you’re running it on your own box or using the Heroku Postgres add-on.

    Use a Connection Pooler

    Postgres connections are not free, as each established connection has a cost. By using a connection pooler, you’ll reduce the number of connections you use and reduce your overhead.

    Most Postgres client libraries include a built-in connection pooler; make sure you’re using it.

    You might also consider using our pgbouncer buildpack if your application requires a large number of connections. PgBouncer is a server-side connection pooler and connection manager that goes between your application and Postgres. Check out some of our documentation for using PgBouncer for Ruby and Java apps.

    Set an Application Name

    Postgres allows you to see what clients are connected and what each of them is doing using the built-in pg_stat_activity table.

    By explicitly marking each connection you open with the name of your dyno, using the DYNO environment variable, you’ll be able to track what your application is doing at a glance:

    SET application_name TO 'web.1';
    

    Now, if you will be to quickly see what each dyno is doing, using heroku pg:ps:

    $ heroku pg:ps
    procpid |         source            |   running_for   | waiting |         query
    ---------+---------------------------+-----------------+---------+-----------------------
       31776 | web.1      | 00:19:08.017088 | f  | <IDLE> in transaction
       31912 | worker.1   | 00:18:56.12178  | t  | select * from customers;
    (2 rows)
    

    You will also be able to see how many connections each dyno is using, and much more, by querying the pg_stat_activity table:

    $ heroku pg:psql
    SELECT application_name, COUNT(*) FROM pg_stat_activity GROUP BY application_name ORDER BY 2 DESC;
          application_name      | count 
    ----------------------------+-------
     web.1         |     15
     web.2         |     15
     worker.1      |     5
    (3 rows)
    

    Set a statement_timeout for Web Dynos

    Long running queries can have an impact on your database performance because they may hold locks or over-consume resources. To avoid them, Postgres allows you to set a timeout per connection that will abort any queries exceeding the specified value. This is especially useful for your web dynos, where you don’t want any requests to run longer than your request timeout.

    SET statement_timeout TO '30s';
    

    Track the Source of Your Queries

    Being able to determine which part of your code is executing a query makes optimization easier, and makes it easier to track down expensive queries or n+1 queries.

    There are many ways to track which part of your code is executing a query, from a monitoring tool like New Relic to simply adding a comment to your SQL specifying what code is calling it:

    SELECT  `favorites`.`product_id` FROM `favorites` -- app/models/favorite.rb:28:in `block in <class:Favorite>'
    

    You will now be able to see the origin of your expensive queries, and be able to track down the caller of the query when using the pg_stat_statements and pg_stat_activity tables:

    $ heroku pg:psql
    SELECT (total_time/sum(total_time) OVER()) * 100 AS exec_time, calls, query FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
    ----------------------------------------------------------------------------------------------------------------------------
    exec_time | 12.2119460729825
    calls     | 7257
    query     | SELECT  `favorites`.`product_id` FROM `favorites` -- app/models/product.rb:28:in `block in <class:Product>'
    

    Many ORMs provide this feature built-in or via extensions, make sure you use it and your debugging and optimization will be easier.

    Learn More

    There is much more you can learn about Postgres, either via the excellent documentation of the project itself, or the Heroku Postgres Dev Center reference. Share your own tips with the community on the #postgrestips hashtag.

  • PostgreSQL 9.6 Now Generally Available on Heroku (Heroku)
    01 Dec 2016 16:42

    PostgreSQL 9.6 is now generally available for Heroku Postgres. The main focus of this release is centered around performance. PostgreSQL 9.6 includes enhanced parallelism for key capabilities that sets the stage for significant performance improvements for a variety of analytic and transactional workloads.

    With 9.6, certain actions, like individual queries, can be split up into multiple parts and performed in parallel. This means that everything from running queries, creating indexes, and sorting have major improvements that should allow a number of different workloads to execute faster than they had in prior releases of PostgreSQL. With 9.6, the PostgreSQL community, along with Heroku’s own open source contributions to this release (a special thanks to Peter Geoghegan), have laid the foundation to bring those enterprise-class features to the world’s most advanced open source relational database.

    Parallelism, Performance, and Scale

    Performance in the form of parallelism means that more work can be done at the same time. One of the areas where this makes a big difference is when Postgres needs to scan an entire table to generate a resultset.

    Imagine for a moment that your PostgreSQL installation has a table in it called emails that stores all of the emails being sent by customers within an application. Let’s say that one of the features that’s provided to customers as part of the application is giving counts on the number of emails being sent to particular email addresses, filtered by the type of person that’s receiving the email. That query might look something like this:

    SELECT e.to
         , count(*) as total
      FROM emails e
     WHERE e.person_type = ‘executives’
     GROUP BY e.to
    

    In this scenario, if our customers have been sending a large number of emails to executives, an index on the table would not help on person_type column because rows with executive in the person_type column represent too many of the rows in the table. In that case, PostgreSQL will resort to scanning all of the rows in the database to find matches for executives.

    For relatively small tables, say thousands of rows, PostgreSQL might be able to perform this quickly. But, if the table has 100 million rows or more, that query will slow to a crawl because it needs to scan every single row. In the 9.6 release, PostgreSQL will be able to break apart the above query and search portions of the table at the same time. This should greatly speed up queries that require full table scans, which happens more often than you think in analytics-based workloads.

    The performance improvements in 9.6 weren’t limited to sequential scans on large tables. Much of the work Heroku contributed to this release was in the way of improved sorting. One of the areas where you’ll see considerable improvement is when you create indexes concurrently. Under the hood, each row in a table has what’s called a Tuple Id (TID), not to be confused with an Object Id. A TID consists of two parts, a block and a row index. Together, the TID identifies where the row can be found within the physical structure of the table. Our patch to this code took the tuple ids and transformed them into a different format prior to sorting in the index which would allow PostgreSQL to sort the TIDs even faster.

    With our contributions to sorting, when you want to create an index concurrently by using the CREATE INDEX CONCURRENTLY syntax, you can experience up to a 50% performance improvement on index creation in certain cases. This is an amazing patch because when CREATE INDEX CONCURRENTLY is used, it won’t lock writes to the table in question like CREATE INDEX would. This allows your application to operate like it normally would without adverse effects.

    Notable Improvements

    Beyond the work done on parallelism, PostgreSQL 9.6 has a number of noteworthy improvements:

    • PostgreSQL foreign data wrapper now supports remote updates, joins and batch updates. That you can distribute workloads across many different PostgreSQL instances.

    • Full text search can now search for adjacent words.

    • Improvements to administrative tasks like VACUUM which shouldn’t scan pages unnceccesarily. This is particularly useful for tables that are append-only like events or logs.

    Getting Started

    When a new Heroku Postgres database is provisioned on any one of the Heroku plan tiers, whether on the Common Runtime or in Private Spaces, 9.6 will be the default version. If you have an existing database on the platform, please check out our documentation for upgrading. This is an exciting update to PostgreSQL that should have many benefits for the workloads that run on Heroku. Give PostgreSQL 9.6 a spin and let us know how we can make PostgreSQL even better. Together, we can make PostgreSQL one of the best relational databases in the business!

  • Apache Kafka, Data Pipelines, and Functional Reactive Programming with Node.js (Heroku)
    29 Nov 2016 16:42

    Heroku recently released a managed Apache Kafka offering. As a Node.js developer, I wanted to demystify Kafka by sharing a simple yet practical use case with the many Node.js developers who are curious how this technology might be useful. At Heroku we use Kafka internally for a number of uses including data pipelines.  I thought that would be a good place to start.

    When it comes to actual examples, Java and Scala get all the love in the Kafka world.  Of course, these are powerful languages, but I wanted to explore Kafka from the perspective of Node.js.  While there are no technical limitations to using Node.js with Kafka, I was unable to find many examples of their use together in tutorials, open source code on GitHub, or blog posts.  Libraries implementing Kafka’s binary (and fairly simple) communication protocol exist in many languages, including Node.js.  So why isn’t Node.js found in more Kafka projects?

    I wanted to know if Node.js could be used to harness the power of Kafka, and I found the answer to be a resounding yes.

    Moreover, I found the pairing of Kafka and Node.js to be more powerful than expected.  Functional Reactive Programming is a common paradigm used in JavaScript due to the language’s first-class functions, event loop model, and robust event handling libraries.  With FRP being a great tool to manage event streams, the pairing of Kafka with Node.js gave me more granular and simple control over the event stream than I might have had with other languages.

    Continue reading to learn more about how I used Kafka and Functional Reactive Programming with Node.js to create a fast, reliable, and scalable data processing pipeline over a stream of events.

    The Project

    I wanted a data source that was simple to explain to others and from which I could get a high rate of message throughput, so I chose to use data from the Twitter Stream API, as keyword-defined tweet streams fit these needs perfectly.

    Fast, reliable, and scalable.  What do those mean in this context?

    • Fast.  I want to be able to see data soon after it is received -- i.e. no batch processing.
    • Reliable.  I do not want to lose any data.  The system needs to be designed for “at least once” message delivery, not “at most once”.
    • Scalable.  I want the system to scale from ten messages per second to hundreds or maybe thousands and back down again without my intervention.

    So I started thinking through the pieces and drew a rough diagram of how the data would be processed.

    Data Pipeline

    Each of the nodes in the diagram represents a step the data goes through.  From a very high level, the steps go from message ingestion to sorting the messages by keyword to calculating aggregate metrics to being shown on a web dashboard.

    I began implementation of these steps within one code base and quickly saw my code getting quite complex and difficult to reason about.  Performing all of the processing steps in one unified transformation is challenging to debug and maintain.

    Take a Step Back

    I knew there had to be a cleaner way to implement this.  As a math nerd, I envisioned a way to solve this by composing simpler functions -- maybe something similar to the POSIX-compliant pipe operator that allows processes to be chained together.

    JavaScript allows for various programming styles, and I had approached this initial solution with an imperative coding style.  An imperative style is generally what programmers first learn, and probably how most software is written (for good or bad).  With this style, you tell the computer how you want something done.

    Contrast that with a declarative approach in which you instead tell the computer what you want to be done.  And more specifically, a functional style, in which you tell the computer what you want done through composition of side-effect-free functions.

    Here are simple examples of imperative and functional programming.  Both examples result in the same value.  Given a list of integers, remove the odd ones, multiply each integer by 10, and then sum all integers in the list.

    Imperative

    const numList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    let result = 0;
    for (let i = 0; i < numList.length; i++) {
      if (numList[i] % 2 === 0) {
        result += (numList[i] * 10)
      }
    }
    

    Functional

    const numList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    const result = numList
                   .filter(n => n % 2 === 0)
                   .map(n => n * 10)
                   .reduce((a, b) => a + b, 0)
    

    Both complete execution with result equal to 300, but the functional approach is much easier to read and is more maintainable.

    If that’s not readily apparent to you, here’s why: in the functional example, each function added to the “chain” performs a specific, self-contained operation.  In the imperative example, the operations are mashed together.  In the functional example, state is managed for me within each function, whereas I have to manage changing state (stored in the result variable) during execution in the imperative version.

    These may seem like small inconveniences, but remember this is just a simple example.  In a larger and more complex codebase the minor inconveniences like these accumulate, increasing the cognitive burden on the developer.

    The data processing pipeline steps were screaming to be implemented with this functional approach.

    But What About The Reactive Part?

    Functional reactive programming, “is a programming paradigm for reactive programming (asynchronous dataflow programming) using the building blocks of functional programming (e.g. map, reduce, filter)" [frp].  In JavaScript, functional reactive programming is mostly used to react to GUI events and manipulate GUI data.  For example, the user clicks a button on a web page, the reaction to which is an XHR request which returns a chunk of JSON.  The reaction to the successfully returned chunk of JSON is a transformation of that data to an HTML list, which is then inserted into the webpage’s DOM and shown to the user.  You can see patterns like this in the examples provided by a few of the popular JavaScript functional reactive programming libraries: Bacon.js, RxJS, flyd.

    Interestingly, the functional reactive pattern also fits very well in the data processing pipeline use case.  For each step, not only do I want to define a data transformation by chaining functions together (the functional part), but I also want to react to data as it comes in from Kafka (the reactive part).  I don’t have the luxury of a fixed length numList.  The code is operating on an unbounded stream of values arriving at seemingly random times.  A value might arrive every ten seconds, every second, or every millisecond.  Because of this I need to implement each data processing step without any assumptions about the rate at which messages will arrive or the number of messages that will arrive.

    I decided to use the lodash utility library and Bacon.js FRP library to help with this.  Bacon.js describes itself as, “a small functional reactive programming lib for JavaScript. Turns your event spaghetti into clean and declarative feng shui bacon, by switching from imperative to functional...Stop working on individual events and work with event-streams instead [emphasis added]” [bac].

    Kafka as the Stream Transport

    The use of event streams makes Kafka an excellent fit here.  Kafka’s append-only, immutable log store serves nicely as the unifying element that connects the data processing steps.  It not only supports modeling the data as event streams but also has some very useful properties for managing those event streams.

    • Buffer: Kafka acts as a buffer, allowing each data processing step to consume messages from a topic at its own pace, decoupled from the rate at which messages are produced into the topic.
    • Message resilience: Kafka provides tools to allow a crashed or restarted client to pick up where it left off.  Moreover, Kafka handles the failure of one of its servers in a cluster without losing messages.
    • Message order: Within a Kafka partition, message order is guaranteed. So, for example, if a producer puts three different messages into a partition, a consumer later reading from that partition can assume that it will receive those three messages in the same order.
    • Message immutability: Kafka messages are immutable.  This encourages a cleaner architecture and makes reasoning about the overall system easier.  The developer doesn’t have to be concerned (or tempted!) with managing message state.
    • Multiple Node.js client libraries: I chose to use the no-kafka client library because, at the time, this was the only library I found that supported TLS (authenticated and encrypted) Kafka connections by specifying brokers rather than a ZooKeeper server.  However, keep an eye on all the other Kafka client libraries out there: node-kafka, kafka-node, and the beautifully named Kafkaesque.  With the increasing popularity of Kafka, there is sure to be much progress in JavaScript Kafka client libraries in the near future.

    Putting it All Together: Functional (and Reactive) Programming + Node.js + Kafka

    This is the final architecture that I implemented (you can have a look at more details of the architecture and code here).

    Twitter Data Processing Pipeline Architecture

    Data flows from left to right.  The hexagons each represent a Heroku app.  Each app produces messages into Kafka, consumes messages out of Kafka, or both.  The white rectangles are Kafka topics.

    Starting from the left, the first app ingests data as efficiently as possible.  I perform as few operations on the data as possible here so that getting data into Kafka does not become a bottleneck in the overall pipeline.  The next app fans the tweets out to keyword- or term-specific topics.  In the example shown in the diagram above, there are three terms.  

    The next two apps perform aggregate metric calculation and related term counting.  Here’s an example of the functional style code used to count the frequency of related terms.  This is called on every tweet.

    function wordFreq(accumulator, string) {
      return _.replace(string, /[\.!\?"'#,\(\):;-]/g, '') //remove special characters
        .split(/\s/)
        .map(word => word.toLowerCase())
        .filter(word => ( !_.includes(stopWords, word) )) //dump words in stop list
        .filter(word => ( word.match(/.{2,}/) )) //dump single char words
        .filter(word => ( !word.match(/\d+/) )) //dump all numeric words
        .filter(word => ( !word.match(/http/) )) //dump words containing http
        .filter(word => ( !word.match(/@/) )) //dump words containing @
        .reduce((map, word) =>
          Object.assign(map, {
            [word]: (map[word]) ? map[word] + 1 : 1,
          }), accumulator
        )
    }
    

    A lot happens here, but it’s relatively easy to scan and understand.  Implementing this in an imperative style would require many more lines of code and be much harder to understand (i.e. maintain).  In fact, this is the most complicated data processing step.  The functional implementations for each of the other data processing apps are even shorter.

    Finally, a web application serves the data to web browsers with some beautiful visualizations.

    Kafka Twitter Dashboard

    Summary

    Hopefully, this provided you with not only some tools but also the basic understanding of how to implement a data processing pipeline with Node.js and Kafka.  We at Heroku are really excited about providing the tools to make evented architectures easier to build out, easier to manage, and more stable.

    If you are interested in deploying production Apache Kafka services and apps at your company, check out our Apache Kafka on Heroku Dev Center Article to get started.

  • Now GA: Read and Write Postgres Data from Salesforce with Heroku External Objects (Heroku)
    15 Nov 2016 16:09

    Today we are announcing a significant enhancement to Heroku External Objects: write support. Salesforce users can now create, read, update, and delete records that physically reside in any Heroku Postgres database from within their Salesforce deployment.

    Increasingly, developers need to build applications with the sophistication and user experience of the consumer Internet, coupled with the seamless customer experience that comes from integration with Salesforce. Heroku External Objects enable a compelling set of integrations scenarios between Heroku and Salesforce deployments, allowing Postgres to be updated based on business processes or customer records in Salesforce.

    With Heroku External Objects, data persisted in Heroku Postgres is presented as an external object in Salesforce. External objects are similar to custom objects, except that they map to data located outside your Salesforce org, and are made available by reference at run time.

    Integration with Salesforce Connect

    Heroku External Objects is built to seamlessly integrate with Salesforce Connect using the OData 4.0 standard. Salesforce Connect enables access to data from a wide variety of external sources, in real-time, without the need to write and maintain integration code. This ‘integration by reference’ approach has a number of compelling benefits:

    • Efficiency: Fast time to value, absence of custom integration code, and reduced storage footprint.

    • Low Latency: Accessing external objects results in data being fetched from the external system in real time, eliminating the risk of data becoming stale over time.

    • Flexibility: External objects in Salesforce share many of the same capabilities as custom objects such as the ability to define relationships, search, expose in lists and chatter feeds, and support for CRUD operations.

    • Platform Integration: External objects can be referenced in Apex, Lightning and VisualForce, and accessed via the Force.com APIs.

    Common Usage Patterns

    We have many Heroku Postgres customers with multi-terabyte databases, which are used in service to an incredibly diverse range of applications. When it comes to integrating this data with Salesforce, we tend to see two, non-exclusive integration patterns: Salesforce as source of truth and Postgres as source of truth.

    Salesforce as source of truth scenarios often entail updates originating from an external application to core Salesforce objects such as Orders, Accounts, and Contacts. Because this data inherently belongs in Salesforce, Heroku Connect synchronization is the preferred solution. With Heroku Connect, you can configure high-scale, low latency data synchronization between Salesforce and Postgres in a handful of mouse clicks.

    Postgres as source of truth scenarios typically require exposing discrete, contextually informed data points, such as an order detail, external status, or computed metric within Salesforce. Physically copying this type of data into Salesforce would be inefficient and result in some degree of latency. Heroku External Objects allows data in Postgres to be exposed as a Salesforce external object, which is queried on access to facilitate real-time integration.

    Heroku External Objects is the newest data integration service of Heroku Connect and available today with Heroku Connect. For more information and documentation, visit the Heroku Connect page, the Heroku Dev Center or the documentation on Force.com. For more information on Salesforce Connect, head on over to the Trailhead.

  • Ruby 3x3: Matz, Koichi, and Tenderlove on the future of Ruby Performance (Heroku)
    10 Nov 2016 13:56

    At RubyKaigi I caught up with Matz, Koichi, and Aaron Patterson aka Tenderlove to talk about Ruby 3x3 and our path so far to reach that goal. We discussed Koichi’s guild proposal, just-in-time compilation and the future of Ruby performance.

    Jonan: Welcome everyone. Today we are doing an interview to talk about new features coming in Ruby 3. I am here with my coworkers from Heroku, Sasada Koichi and Yukihiro Matsumoto, along with Aaron Patterson from GitHub.

    Jonan: So, last year at RubyKaigi you announced an initiative to speed up Ruby by three times by the release of version three. Tell us more about Ruby 3x3.

    Matz: In the design of the Ruby language we have been primarily focused on productivity and the joy of programming. As a result, Ruby was too slow, because we focused on run-time efficiency, so we’ve tried to do many things to make Ruby faster. For example the engine in Ruby 1.8 was very slow, it was written by me. Then Koichi came in and we replaced the virtual machine. The new virtual machine runs many times faster. Ruby and the Ruby community have continued to grow, and some people still complain about the performance. So we are trying to do new things to boost the performance of the virtual machine. Even though we are an open source project and not a business, I felt it was important for us to set some kind of goal, so I named it Ruby 3x3. The goal is to make Ruby 3 run three times faster as compared to Ruby 2.0. Other languages, for example Java, use the JIT technique, just in time compilation; we don't use that yet in Ruby. So by using that kind of technology and with some other improvements, I think we can accomplish the three times boost.

    Aaron: So it’s called Ruby 3x3, three times three is nine and JRuby is at nine thousand. Should we just use JRuby?

    Jonan: Maybe we should. So Ruby 3x3 will be three times faster. How are you measuring your progress towards that goal? How do we know? How do you check that?

    Matz: Yes, that's an important point. So in the Ruby 3x3 project, we are comparing the speed of Ruby 3.0 with the speed of Ruby 2.0. We have completed many performance improvements in Ruby 2.1 and 2.3, so we want to include that effort in Ruby 3x3. The baseline is Ruby 2.0. This is the classification.

    Aaron: So your Rails app will not likely be three times faster on Ruby 3?

    Matz: Yeah. Our simple micro-benchmark may run three times faster but we are worried that a real-world application may be slower, it could happen. So we are going to set up some benchmarks to measure Ruby 3x3. We will measure our progress towards this three times goal using those benchmark suites. We haven't set them all up yet but they likely include at least optcarrot (an NES emulator) and some small Rails applications, because Rails is the major application framework for the Ruby language. We’ll include several other types of benchmarks as well. So we have to set that up, we are going to set up the benchmark suites.

    Jonan: So, Koichi recently made some changes to GC in Ruby.We now use a generational garbage collector. Beyond the improvements that have been made already to GC, what possibility is there for more improvement that could get us closer to Ruby 3x3? Do you think the GC changes are going to be part of our progress there?

    Koichi: As Matz says Ruby’s GC is an important program, it has a huge overhead. However, the recent generational garbage collector I don't think has nearly as much overhead. Maybe only ten percent of Ruby’s time is spent in GC, or something like that. If we can speed up garbage collection an additional ten times, it's still only ten percent of the overall time. So sure we should do more for garbage collection, but we have lots of other more impactful ideas. If we have time and specific requests for GC changes, we will certainly consider those.

    Aaron: … and resources...

    Koichi: Yes.

    Aaron: The problem is, since, for us at GitHub we do out-of-band garbage collections, garbage collection time makes no difference on the performance of the requests anyway. So even if garbage collection time is only ten percent of the program and we reduce that to zero, say garbage collection takes no time at all, that's not three times faster so we wouldn't make our goal anyway. So, maybe, GC isn't a good place to focus for the Ruby 3x3 improvements.

    Matz: Yeah we have already added the generational garbage collector and incremental garbage collection. So in some cases, some applications, large web applications for example, may no longer need to do that out-of-band garbage collection.

    Aaron: Yeah, I think the only reason we are doing it is because we are running Ruby 2.1 in production but we're actually on the path to upgrading. We did a lot of work to get us to a point where we could update to Ruby 2.3, it may be in production already. My team and I did the work, somebody else is doing the deployment of it, so I am not sure if it is in production yet but we may soon be able to get rid of out-of-band collection anyway.

    Matz: Yes in my friend's site, out-of-band collection wasn’t necessary after the deployment of Ruby 2.3.

    Jonan: So the GC situation right now is that GC is only maybe about ten percent of the time it takes to run any Ruby program anyway. So, even if we cut that time by half, we're not going to win that much progress.

    Matz: It's no longer a bottleneck so the priority is lower now.

    Jonan: At RailsConf, Aaron talked about memory and memory fragmentation in Ruby. If I remember correctly it looked to me like we were defragging memory, which is addressed, so in my understanding that means that we just point to it by the address; we don't need to put those pieces of memory close together. I'm sure there's a reason we we might want to do that; maybe you can explain it Aaron.

    Aaron: Sure. So, one of the issues that we had at, well, we have this issue at GitHub too, is that our heap gets fragmented. We use forking processes, our web server forks, and eventually it means that all of the memory pages get copied out at some point. This is due to fragmentation. When you have a fragmented heap, when we allocate objects, we are allocating into those free slots and so since we're doing writes into those slots, it will copy those pages to child processes. So, what would be nice, is if we could eliminate that fragmentation or reduce the fragmentation and maybe we wouldn't copy the child pages so much. Doing that, reducing the fragmentation like that, can improve locality but not necessarily. If it does, if you are able to improve the locality by storing those objects close to each other in memory, they will be able to hit caches more easily. If they hit those caches, you get faster access, but you can't predict that. That may or may not be a thing, and it definitely won't get us to Ruby 3x3.

    Jonan: Alright.

    Matz: Do you have any proof on this? Or a plan?

    Aaron: Any plan? Well yes, I prepared a patch that...

    Matz: Making it easier to separate the heap.

    Aaron: Yes, two separate heaps. For example with classes or whatever types with classes, we’ll allocate them into a separate heap, because we know that classes are probably not going to get garbage collected so we can put those into a specific location.

    Koichi: Do you have plans to use threads at GitHub?

    Aaron: Do I have plans to use threads at GitHub? Honestly, I don't know. I doubt it. Probably not. We'll probably continue to use unicorn in production. Well I mean we could but I don't see why. I mean we're working pretty well and we're pretty happy using unicorn in production so I don't think we would switch. Honestly, I like the presentation that you gave about guilds, if we could use a web server based on guilds, that would be, in my opinion, the best way.

    Matz: Yes, I think it's promising.

    Jonan: So these guilds you mentioned (Koichi spoke about guilds at RubyKaigi), maybe now is a good time to discuss that. Do you want to tell us about guilds? What they are and how that affect plans for Ruby 3x3?

    Matz: We have three major goals in Ruby 3. One of them is performance, which is that our program is really running three times faster. The second goal is the concurrency model, which is implemented by something like Ruby guilds.

    Koichi: So concurrency and parallelism utilize some CPU cores.

    Matz: Yeah, I say concurrency just because the guild is the concurrency model from the programmer's view. Implementation-wise it should be parallelism.

    Koichi: I'm asking about the motivation of the concurrency.

    Matz: Motivation of the concurrency?

    Koichi: Not only the performance but also the model.

    Matz: Well we already have threads. Threads are mostly ok but it doesn't run parallel, due to the existing GIL. So guilds are a performance optimization. Concurrency by guilds may make the threading program or the concurrency runtime program faster, but the main topic is the data abstraction for concurrent projects.

    Jonan: OK. So while we are on the topic of threads I am curious. I've heard people talk about how it might be valuable to have a higher level of abstraction on top of threads because threads are quite difficult to use safely. Have you all thought about adding something in addition to threads that maybe protects us from ourselves a little bit around some of those problems? Is that what guilds are?

    Aaron: Yes, that's essentially what the guild is, it's a higher level abstraction so you can do parallel programming safely versus threads where it's not safe at all. It's just...

    Koichi: Yes, so it's a problem with concurrency in Ruby now; sharing mutable objects between threads. The idea of guilds, the abstraction more than guilds specifically, is to prohibit sharing of mutable objects.

    Jonan: So when I create an object how would I get it into a guild? If I understand correctly, you have two guilds - A and B - and they both contain mutable objects. With the objects in A, you could run a thread that used only those objects, and run a different thread that used only objects in B, and then you would eliminate this problem and that's why guilds will exist. But how do I put my objects into guilds or move them between guilds? Have you thought about it that far yet?

    Matz: Yeah, a guild is like some kind of bin, a container with objects. With it, you cannot access the objects inside the guild from outside, because the objects are members of the guild. However, you can transfer the objects from one guild to another. So, by transferring, the new objects can be accessed in the destination guild.

    Jonan: I see, OK. So the objects that are in a guild can't be accessed from outside that guild; other guilds can't get access to them. Then immutable objects are not members of guilds. They are outside.

    Koichi: So immutable objects are something like freelance objects. Freelance objects are immutable, so any guild can access them because there are no read-write conflicts.

    Jonan: So you would just use pointers to point to those immutable objects?

    Koichi: Yes. Also, I want to note that immutable doesn't mean frozen object. Frozen objects can contain mutable objects. So I mean those immutable objects which only contain children that point to immutable objects.

    Jonan: So if we had a nested hash, some large data structure, we would need to freeze every object in that in order to reference it this way. Is there a facility in Ruby right now to do that? I think I would have to iterate over that structure freezing objects manually today.

    Matz: Not yet.

    Jonan: So there might be?

    Matz: We need to provide something to freeze these objects.

    Aaron: A deep freeze.

    Matz: Yes, deep freeze.

    Jonan: Deep Freeze is the name of this feature maybe? I think that would be an excellent name for it.

    Aaron: I like deep freeze. (Koichi would like to note that the name for this feature has not yet been determined)

    Jonan: I think you mentioned it earlier but maybe you could tell us a little more about just in time compilation, the JIT, and how we might approach that in Ruby 3.

    Matz: The JIT is a pretty attractive technology for gaining performance. You know, as part of the Ruby 3x3 effort we are probably going to introduce some kind of JIT. Many other virtual machines have introduced the LLVM JIT. However, personally, I don't want to use the LLVM JIT for Ruby 3, just because the LLVM itself is a huge project, and it's much younger than Ruby. Ruby is more than twenty years old. It's possibly going to live for twenty more years, or even longer, so relying on other huge projects is kind of dangerous.

    Aaron: What do you think of Shyouhei’s stuff?

    Matz: The optimizer?

    Aaron: Yeah.

    Matz: Yeah, it's quite interesting, but its application is kind of limited. We have to measure it.

    Koichi: I think Shyouhei’s project is a good first step, but we need more time to consider it.

    Jonan: Can you explain what it is?

    Aaron: Yeah, so Shouhei, what he did was he...

    Matz: De-optimization.

    Aaron: Yeah he introduced a de-optimization framework that essentially lets us copy old instructions, or de-optimized instructions, into the existing instruction sequences. So he can optimize instructions and if anything happens that would… well, I guess I should step back a little bit. So if you write, in Ruby, 2 + 4, typically the plus operator is not overwritten. So if you can make that assumption then maybe we can collapse that down and replace it with just six. Right?

    Jonan: I see.

    Aaron: But if somebody were to override the plus method, we would have to not do that class because we wouldn't know what the plus does. And in order to do that, we have to de-optimize and go back to the original instructions that we had before. So, what Shouhei did was he introduced this de-optimization framework. It would allow us to take those old instructions and copy them back in, in case someone were to do something like what I described, overriding plus.

    Matz: JRuby people implement very nice de-optimization technologies. They made just such a de-optimization framework on the Java Virtual Machine, so on this topic at least they are a bit ahead of us.

    Aaron: Well the one thing, the other thing that I don't know; if you watch the JRuby or JRuby Plus Truffle stuff, if you read any of the papers about it, there are tradeoffs, the JIT isn't free. I mean we have to take into consideration how much memory usage that will require. People hearing this shouldn't think "oh well let's just add a JIT that's all we have to do and then it will be done". It’s much harder, there are more tradeoffs than just simply add a JIT.

    Jonan: Yes. So there was an older implementation, RuJIT, the Ruby JIT, but RuJIT had some memory issues didn't it?

    Koichi: Yes, quite severe. It needed a lot of memory. Such memory consumption is controllable, however, so we can configure how much memory they can use.

    Jonan: OK, so you just set a limit for how much the JIT uses and then it would do the best it could with what you had given it, basically?

    Koichi: Yeah.

    Jonan: OK.

    Koichi: RuJIT can improve the performance of micro-benchmarks but I’m not sure about the performance in larger applications.

    Jonan: So, for Rails applications maybe we should call it "Ruby 1.2x3" or something.

    Aaron: I think that's an interesting question to bring up because if a Rails application is part of the base benchmarks, are we really going to make a Rails application three times faster?

    Matz: We need to make our performance number calculations pretty soon. This is a big problem I think. So maybe some kind of operation such as concatenating...

    Aaron: Concatenation, yeah.

    Matz: … or temporary variable creation or something like that, we can improve the performance.

    Aaron: So, I think it's interesting if we come up with a benchmark that's using string concatenation. I mean we could use an implementation for that. For example, what if we used ropes instead. If we did that, maybe string concatenation would become very fast, but we didn't really improve the virtual machine at all, right? So, how do we balance, does that make sense? How do we balance those things?

    Matz: So unlike the typical application, the language can be applied anywhere, so it can be used to write Rails applications, or science applications, or games, so I don't think improving that one thing will necessarily change that situation. So we have to do everything, maybe introducing ropes, introducing a JIT in some form, introducing some other mechanisms as well to see that improvement. We have to do it.

    Aaron: So maybe the key is in the benchmarks that we have. We have something doing a lot of string concatenations, something doing a lot of math, maybe something doing, I don't know, I/O. Something like that?

    Matz: Yeah. We have to. We cannot be measured by one single application, we need several.

    Aaron: Right.

    Matz: And then in the Rails application we have to avoid the database access. Just because, you know, access to the database is slow, can be very slow. That we cannot improve.

    Jonan: So, along with the JIT, you've also talked about some type changes to coming to Ruby 3 and the optional static types. Can you tell us about that?

    Matz: Yeah, the third major goal of the Ruby 3 is adding some kind of static typing whille keeping the duck typing, so some kind of structure for soft-typing or something like that. The main goal of the type system is to detect errors early. So adding this kind of static type check or type interfaces does not affect runtime.

    Matz: It’s just a compile time check. Maybe you can use that kind of information in IDEs so that the editors can use that data for their code completion or something like that, but not for performance improvement.

    Aaron: You missed out on a really good opportunity for a pun.

    Jonan: Did I? What was the pun?

    Aaron: You should have said, "What type of changes will those be?"

    Jonan: What type of type changes would those be? Yes. I've been one-upped once again, by pun-master Aaron here.

    Aaron: I was holding it in, I really wanted to say something.

    Jonan: You looked up there suddenly and I thought, did I move on too early from the JIT discussion? No, it was a pun. That was the pun alert face that happened there, good. I'm sorry that we missed the pun. So, to summarize then, the static type system is not something that will necessarily improve performance...

    Koichi: Yes.

    Jonan: ...but it would be an optional static type system, and it would allow you to check some things before you're running your program and actually running into errors.

    Matz: Yeah, and if you catch those errors early you can improve your productivity.

    Jonan: Yes, developer productivity.

    Matz: Yeah.

    Jonan: Which is, of course, the primary goal of Ruby, or developer happiness rather, not necessarily productivity. So, the JIT, this just in time compiler, right now Ruby has ahead of time compilation (AOT) optionally? There's some kind of AOT stuff that you can do in Ruby?

    Matz: I don't code with it.

    Aaron: “Some, kind of”.

    Jonan: OK.

    Aaron: It has a framework built in to allow you to build your own AOT compiler. It has the tools in there to let you build an AOT compiler, and I think you wrote a gem, the...

    Koichi: Yeah, Yomikomu.

    Aaron: Yeah.

    Jonan: OK. Yomikomu is an AOT compiler for Ruby. Can you describe just a little bit what that means? What ahead of time compilation would mean in this case? What does it do?

    Koichi: Ruby compiles at runtime, so we could store the compiled binary to the file system or something, some database or somewhere. The Yomikomu gem uses this feature, writing out instruction sequences to the file system at runtime, so we can skip the compiler tool in the future. It’s only a small improvement, I think, maybe 30%.

    Aaron: 30%?

    Matz: 30% is huge.

    Aaron: Yeah!

    Jonan: That seems like a pretty good improvement to me.

    Koichi: I do think so.

    Aaron: We just need a few more 30% improvements then Ruby 3x3 is done.

    Matz: Yeah, that means 30% of the time is spent in the compiler.

    Koichi: Yeah, in 2.3.

    Matz: That’s huge!

    Aaron: That's what I said!

    Jonan: So, rather than JIT, have you thought about maybe like a little too late compiler? We could just compile after the program runs and we don't need to compile it all then. Maybe wouldn’t be as popular as a just in time compiler.

    Aaron: One thing I think would be interesting, one thing that I'd like to try someday, is to take the bytecode that's been written out and analyze it. So we could know for example that we can use this trick that Shyouhei’s doing with constant folding. Since we have all of the bytecode written out, you should be able to tell by analyzing the bytecode whether or not... actually maybe you couldn't tell that. I was going to say we could analyze the bytecode and optimize it with code, rewriting an optimized version to disk. But since you can do so much stuff at runtime, I don't know if it would work in all cases.

    Koichi: This is exactly what the JIT or some kind of comparable approach aims to do.

    Aaron: Yeah.

    Jonan: So, cases like you were talking about earlier where this plus can be overridden in Ruby, so what you would do is assume the plus is not overridden and you would just put six, you would actually write that into the bytecode, just the result of this value. Then this framework would allow you to later, if someone did overwrite the plus method dynamically while the program was running, to swap it out again for the old implementation.

    Aaron: Yes.

    Jonan: OK.

    Aaron: So basically the public service announcement is: "don't do that."

    Jonan: Don't do that. Don't override plus.

    Aaron: Just stop it.

    Jonan: Just stop it. You're going to make the Ruby team's life harder.

    Koichi: Yes, lots harder.

    Jonan: OK. Is there anything else you would like to add about Ruby 3? Anything we didn't touch on today that might be coming?

    Matz: You know, we’ve been working on Ruby 3 for maybe two years right now, but we are not expecting to release in a year or even two. Maybe by 2020?

    Aaron: Does that mean that we have to wait, are we really going to wait for Ruby 3 to introduce guilds? Or are we going to introduce that before Ruby 3?

    Matz: Before Ruby 3 I guess.

    Aaron: OK.

    Matz: Yeah, we still have a lot of things to do to implement guilds.

    Aaron: Of course.

    Matz: For example, the garbage collection is pretty difficult. The isolated threads can't access the same objects in that space, so it will be very difficult to implement garbage collection. I think we’ve had a lot of issues with that in the past, so that could take years. But if we’re done, we are happy to introduce guilds into maybe Ruby 2... 6?.

    Aaron: 2.6, yeah.

    Matz: So this is because we don't want to break compatibility. So if a program isn’t using guilds it should run the same way.

    Jonan: So this is how we are able to use immutable objects in Ruby, but they’re frozen objects. They can’t be unfrozen.

    Matz: No.

    Jonan: OK.

    Koichi: Freezing is a one-way operation.

    Aaron: Yes.

    Jonan: OK. So then, a friend asked me when I described guilds, he writes a lot of Haskell, he asked me when we are we going to have "real immutable objects", and I don't quite know what he means. Is there some distinction between an immutable object in Ruby and an immutable object in a different language that’s important?

    Matz: For example in Haskell, everything is immutable, it’s that kind of language, everything is immutable from day one.

    Jonan: Yes.

    Matz: But in Ruby we have mutable objects, so under that kind of situation we need a whole new construct.

    Aaron: Frozen objects should really be immutable. It's really immutable.

    Jonan: OK.

    Aaron: I don't...

    Jonan: You don't know what this person who air-quoted me "real immutable" was saying?

    Aaron: Yeah I don't know why they would say "real immutable".

    Jonan: Should I unfriend him on Facebook? I think I'm going to after this.

    Matz: At least tell him if you want "real immutable" go ahead and use Haskell.

    Jonan: I think that's an excellent option, yeah.

    Aaron: You just to need to say to them quit "Haskelling" me.

    Jonan: I should, I’ll just tell them to quit "Haskelling" me about immutable objects. Well, it has been a pleasure. Thank you very much for taking the time. We've run a little bit longer than promised but I think it was very informative, so hopefully people get a lot out of it. Thank you so much for being here.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>