6 results back to index

pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

So even though we may hire people who come in with very little programming experience, we work very hard to instill in them very quickly the importance of engineering, engineering practices, and a lot of good agile programming practices. This is helpful to them and us, as these can all be applied almost one-to-one to data science right now. If you look at dev ops right now, they have things such as continuous integration, continuous build, automated testing, and test harnesses—all of which map very well from the dev ops world to the data ops (a phrase I stole from Red Monk) world very easily. I think this is a very powerful notion. It is important to have testing frameworks for all of your data, so that if you make a code change, you can go back and test all of your data.

So I have to stay on my toes and keep the internal customer in mind; a designer is going to understand and interact with data products and research in a fundamentally different way than a developer, so I have to keep my ears open and my communication skills—verbal and written—sharp. When you talk to a designer, the way that they think about a problem and the tools they employ to solve that problem are going to be very different from, say, a data scientist or very different from a dev ops person. Those differing perspectives are healthy. There’s a—perhaps apocryphal—operations research story about an operations research [OR] person who was hired to fix an elevator-scheduling problem. The OR person initially thinks that they should build a model to solve the rush-hour traffic problem at this elevator bank.

I would say that about 60 percent of the time we involve engineering, and about 40 percent of the time we do it ourselves. If we need something really performant and it is complicated and involves a lot of configuration, then we always involve engineering there. Eventually there is a process to migrate the prototype to production code. Engineering will push our combined work to the dev ops group, which is where it is moved into production. Then we monitor it and hopefully never touch it again. Gutierrez: How do you do the scaling test? Lenaghan: We slowly step up the scale of data we run through the prototype in two dimensions. We have the geospatial dimension, which is large, but not extremely large.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst


algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Training is a prerequisite for understanding the paradigm shift that Big Data offers. Without that insider knowledge, it becomes difficult to explain and communicate the value of data, especially when the data are public in nature. Next on the list is the integration of development and operations teams (known as DevOps), the people most likely to deal with the burdens of storing and transforming the data into something usable. Much of the process of moving forward will lie with the business executives and decision makers, who will also need to be brought up to speed on the value of Big Data. The advantages must be explained in a fashion that makes sense to the business operations, which in turn means that IT pros are going to have to do some legwork.

See Business intelligence (BI) Big Data and Big Data analytics analysis categories application platforms best practices business case development challenges classifications components defined evolution of examples of 4Vs of goal setting introduction investment in path to phases of potential of privacy issues processing role of security (See Security) sources of storage team development technologies (See Technologies) value of visualizations Big Science BigSheets Bigtable Bioinformatics Biomedical industry Blekko Business analytics (BA) Business case best practices data collection and storage options elements of introduction Business intelligence (BI) as Big Data analytics foundation Big Data analytics team incorporation Big Data impact defined extract, transform, and load (ETL) information technology and in-memory processing limitations of marketing campaigns risk analysis storage capacity issues unstructured data visualizations Business leads Business logic Business objectives Business rules C Capacity of storage systems Cassandra Census data CERN Citi Classification of data Cleaning Click-stream data Cloud computing Cloudera Combs, Nick Commodity hardware Common Crawl Corpus Communication Competition Compliance Computer security officers (CSOs) Consulting firms Core capabilities, data analytics team Costs Counterintelligence mind-set CRUD (create, retrieve, update, delete) applications Cryptographic keys Culture, corporate Customer needs Cutting, Doug D Data defined growth in volume of value of See also Big Data and Big Data analytics Data analysis categories challenges complexity of as critical skill for team members data accuracy evolution of importance of process technologies Database design Data classification Data discovery Data extraction Data integration technologies value creation Data interpretation Data manipulation Data migration Data mining components as critical skill for team members defined examples methods technologies Data modeling Data protection. See Security Data retention Data scientists Data sources growth of identification of importation of data into platform public information Data visualization Data warehouses DevOPs Discovery of data Disk cloning Disruptive technologies Distributed file systems. See also Hadoop Dynamo E e-commerce Economist e-discovery Education 80Legs Electronic medical records compliance data errors data extraction privacy issues trends Electronic transactions EMC Corporation Employees data analytics team membership monitoring of training Encryption Entertainment industry Entity extraction Entity relation extraction Errors Event-driven data distribution Evidence-based medicine Evolution of Big Data algorithms current issues future developments modern era origins of Expectations Expediency-accuracy tradeoff External data Extract, transform, and load (ETL) Extractiv F Facebook Filters Financial controllers Financial sector Financial transactions Flexibility of storage systems 4Vs of Big Data G Gartner General Electric (GE) Gephi Goal setting Google Google Books Ngrams Google Refine Governance Government agencies Grep H Hadoop advantages and disadvantages of design and function of event-processing framework future origins of vendor support Yahoo’s use HANA HBase HDFS Health care Big Data analytics opportunities Big Data trends compliance evolution of Big Data See also Electronic medical records Hibernate High-value opportunities History.


pages: 234 words: 63,522

Puppet Essentials by Felix Frank


cloud computing, Debian, DevOps, domain-specific language, Infrastructure as a Service, platform as a service, web application

Table of Contents Preface 1 Chapter 1: Writing Your First Manifests 7 Getting started 8 Introducing resources and properties 10 Interpreting the output of the puppet apply command 11 Dry-testing your manifest 12 Adding control structures in manifests 13 Using variables 14 Variable types 14 Controlling the order of evaluation 16 Declaring dependencies 17 Error propagation 20 Avoiding circular dependencies 21 Implementing resource interaction 22 Examining the most notable resource types 25 The user and group types 26 The exec resource type 27 The cron resource type 29 The mount resource type 29 Summary 30 Chapter 2: The Master and Its Agents The Puppet master Setting up the master machine Creating the master manifest Inspecting the configuration settings Setting up the Puppet agent The agent's life cycle 31 31 32 33 35 35 38 Table of Contents Renewing an agent's certificate 40 Running the agent from cron 41 Performance considerations 42 Switching to Phusion Passenger 43 Using Passenger with Nginx 45 Basic tuning 46 Troubleshooting SSL issues 47 Summary 48 Chapter 3: A Peek Under the Hood – Facts, Types, and Providers Summarizing systems with Facter Accessing and using fact values Extending Facter with custom facts Simplifying things using external facts 49 50 52 53 55 Goals of Facter 57 Understanding the type system 57 The resource type's life cycle on the agent side 58 Substantiating the model with providers 59 Providerless resource types 61 Summarizing types and providers 61 Putting it all together 62 Summary 64 Chapter 4: Modularizing Manifests with Classes and Defined Types Introducing classes and defined types Defining and declaring classes Creating and using defined types Understanding and leveraging the differences Structured design patterns Writing comprehensive classes Writing component classes Using defined types as resource wrappers Using defined types as resource multiplexers Using defined types as macros Exploiting array values using defined types Including classes from defined types Nesting definitions in classes Establishing relationships among containers Passing events between classes and defined types [ ii ] 65 66 66 67 69 71 71 73 74 76 77 78 81 82 83 83 Table of Contents Ordering containers 86 Limitations 86 Performance implications of container relationships 89 Mitigating the limitations 90 The anchor pattern The contain function 90 91 Making classes more flexible through parameters 92 Caveats of parameterized classes 92 Preferring the include keyword 93 Summary 94 Chapter 5: Extending Your Puppet Infrastructure with Modules 95 An overview of Puppet's modules Parts of a module How the content of each module is structured Documentation in modules Maintaining environments Configuring environment locations Obtaining and installing modules Modules' best practices Putting everything in modules Avoiding generalization Testing your modules 96 96 97 98 99 100 101 102 102 103 104 Building a specific module Naming your module Making your module available to Puppet Implementing the basic module functionality Creating utilities for derived manifests 105 106 106 106 110 Safe testing with environments Adding configuration items Allowing customization Removing unwanted configuration items Dealing with complexity Enhancing the agent through plugins Replacing a defined type with a native type 104 111 113 114 115 116 118 Enhancing Puppet's system knowledge through facts 125 Refining the interface of your module through custom functions 126 Making your module portable across platforms 128 Finding helpful Forge modules 130 Identifying modules' characteristics 130 Summary 131 [ iii ] Table of Contents Chapter 6: Leveraging the Full Toolset of the Language 133 Chapter 7: Separating Data from Code Using Hiera 157 Templating dynamic configuration files 134 Learning the template syntax 134 Using templates in practice 135 Avoiding performance bottlenecks from templates 136 Creating virtual resources 137 Realizing resources more flexibly using collectors 140 Exporting resources to other agents 141 Exporting and importing resources 142 Configuring the master to store exported resources 142 Exporting SSH host keys 143 Managing hosts files locally 144 Automating custom configuration items 144 Simplifying the Nagios configuration 145 Maintaining your central firewall 146 Overriding resource parameters 147 Making classes more flexible through inheritance 148 Understanding class inheritance in Puppet 149 Naming an inheriting class 151 Making parameters safer through inheritance 151 Saving redundancy using resource defaults 152 Avoiding antipatterns 154 Summary 155 Understanding the need for separate data storage 158 Consequences of defining data in the manifest 159 Structuring configuration data in a hierarchy 161 Configuring Hiera 163 Storing Hiera data 164 Choosing your backends 165 Retrieving and using Hiera values in manifests 165 Working with simple values 166 Binding class parameter values automatically 167 Handling hashes and arrays 170 Converting resources to data 172 Choosing between manifest and Hiera designs 175 Using Hiera in different contexts 175 A practical example 177 Debugging Hiera lookups 179 Summary 180 [ iv ] Table of Contents Chapter 8: Configuring Your Cloud Application with Puppet 181 Typical scopes of Puppet 182 Common data center use – roles and profiles 183 Taking Puppet to the cloud 184 Initializing agents in the cloud 185 Using Puppet's cloud-provisioner module 186 Building manifests for the cloud 187 Mapping functionalities to nodes 187 Choosing certificate names 190 Creating a distributed catalog 191 Composing arbitrary configuration files 194 Handling instance deletions 197 Preparing for autoscaling 198 Managing certificates 198 Limiting round trip times 200 Ensuring successful provisioning 202 Adding necessary relationships 203 Testing the manifests 204 Summary 205 Index 207 [v] Preface The software industry is changing and so are its related fields. Old paradigms are slowly giving way to new roles and shifting views on what the different professions should bring to the table. The DevOps trend pervades evermore workflows. Developers set up and maintain their own environments, and operations raise automation to new levels and translate whole infrastructures to code. A steady stream of new technologies allows for more efficient organizational principles. One of these newcomers is Puppet.

What you have learned will most likely satisfy your immediate requirements. For information beyond these lessons, don't hesitate to look up the excellent online documentation at https://docs. or join the community and ask your questions on chat or in the mailing list. Thanks for reading, and have lots of fun with Puppet and its family of DevOps tools. [ 206 ] Index A agents initializing, in cloud 185 resources, exporting to 141 anchor pattern about 90 URL 91 antipatterns avoiding 154, 155 apt-get command 8 arrays 15 autorequire feature 125 autoscaling feature about 198 certificates, managing 198-200 round trip times, limiting 200-202 autosigning URL 200 autosigning script 198 B backends selecting 165 URL, for online documentation 165 beaker about 105 URL 105 before metaparameter 19, 21, 24 C classes about 66 component classes, writing 73, 74 comprehensive classes, writing 71, 72 creating, with parameters 92 declaring 66, 67 defining 66, 67 definitions, nesting 82 differentiating, with defined types 69, 70 include keyword, preferring 93 parameterized classes, consequences 92, 93 class inheritance 149 cloud agents, initializing in 185 manifests, building for 187 cloud-provisioner module using 186 collectors used, for realizing resources 140, 141 component classes writing 73, 74 composite design 71 comprehensive classes writing 71, 72 configuration data structuring, in hierarchy 161, 162 containers events, passing between classes and defined types 83-85 limitations 86-89 limitations, mitigating 90 ordering 86 relationships, establishing among 83 containers, limitations anchor pattern 90 contain function 91 control structures adding, in manifest 13, 14 creates parameter 28 cron resource type 29 custom attribute 191 custom facts about 53 Facter, extending with 53-55 custom functions about 96 used, for refining custom module interface 126-128 custom module building 105 enhancing, through facts 125 implementing 106-109 interface, refining through custom functions 126-128 making, portable across platforms 128, 129 naming 106 using 106 utilities, creating for derived manifests 110 custom types 117 D data resources, converting to 172-174 data, defining in manifest consequences 159, 160 defined types about 66 creating 67-69 differentiating, with classes 69, 70 used, for exploiting array values 78-81 using 67-69 using, as macros 77, 78 using, as resource multiplexers 76 using, as resource wrappers 74, 75 dependency 20 documentation, modules 98, 99 domain-specific language (DSL) 8 dynamic configuration files templating 134 dynamic scoping 154 E enabled property 10 ensure property 10 environment.conf file 100 environment locations configuring 100, 101 environments maintaining 99, 100 modules, installing 101, 102 modules, obtaining 101, 102 used, for testing modules 104, 105 evaluation order circular dependencies, avoiding 21, 22 controlling 16 dependencies, declaring 17-20 error propagation 20 events about 23 passing, between classes and defined types 83-85 exec resource type 27 external facts using 55, 56 External Node Classifiers (ENCs) 174 F Faces 186 Facter example 62 extending, with custom facts 53-55 goals 57 systems, summarizing with 50, 51 facts URL, for documentation 125 used, for enhancing custom module 125 fact values accessing 52, 53 using 52, 53 flexibility, providing to classes about 148 class inheritance 149 inheriting class, naming 151 parameters, making safer through inheritance 151 [ 208 ] Forge modules' characteristics, identifying 130 URL 130 used, for searching modules 130 fqdn_rand function 41 fully qualified domain name (FQDN) 52 G group resource type 26 H hashes 14 Hiera arrays, handling 170-172 class parameter values, binding 167-169 configuring 163 data, storing 164 hashes, handling 170-172 lookups, defining 179 practical example 177, 178 using, in different contexts 175, 176 values, retrieving 165 values, using in manifest 165 working with simple values 166, 167 hiera_array function 170 hiera_hash function 171 hierarchy configuration data, structuring in 161, 162 I immutability, variables 14 include keyword preferring 93 Infrastructure as a Service (IaaS) 184 Infrastructure as Code paradigm 105 inheriting class naming 151 installation, modules 101, 102 instances method 123 M manifest about 182 control structures, adding in 13, 14 dry-testing 12 structure 9 manifest, and Hiera designs selecting between 175 manifest, building for cloud about 187 arbitrary configuration files, composing 194-196 certificate names, selecting 190, 191 distributed catalog, creating 191-194 functionality, mapping to nodes 187-189 instance deletions, handling 197, 198 metaparameters 18 model substantiating, with providers 59, 60 modules about 96 agent, enhancing through plugins 116, 117 best practices 102 content structure 97, 98 documentation 98, 99 generalization, avoiding 103 identifying, in Forge 130 important parts 96 installing 101, 102 manifest files, gathering 102, 103 obtaining 101, 102 searching, in Forge 130 testing 104 testing, with environments 104, 105 URL, for publishing 98 monolithic implementation 71 mount resource type 29, 30 N Nginx about 45 Phusion Passenger, using with 45, 46 nodes file 100 Notice keyword 20 [ 209 ] O operatingsystemrelease fact 53 output interpreting, of puppet apply command 11, 12 P Proudly sourced and uploaded by [StormRG] Kickass Torrents | TPB | ExtraTorrent | h33t parameterized classes consequences 92, 93 parameters versus properties 10 parser functions 96 performance bottlenecks avoiding, from templates 136 performance considerations about 42 basic tuning 46 Passenger, using with Nginx 45 switching, to Phusion Passenger 43, 44 Phusion Passenger switching to 43, 44 URL, for installation instructions 45 using, with Nginx 45, 46 Platform as a Service (PaaS) 184 plugins about 116 custom types, creating 118 custom types, naming 118 management commands, declaring 121 provider, adding 121 provider, allowing to prefetch existing resources 123, 124 provider functionality, implementing 122, 123 resource names, using 120 resource type interface, creating 119 sensible parameter hooks, designing 120 types, making robust 125 used, for enhancing modules agent 116, 117 plugins, types custom facts 116 parser functions 116 providers 116 types 116 processorcount fact 52 properties about 10 versus parameters 10 providerless resource types 61 provider parameter 10 providers model, substantiating with 59, 60 summarizing 61 Puppet about 182 installing 8 modules 96 typical scopes 182 URL 182 Puppet agent certificate, renewing 40 life cycle 38, 39 running, from cron 41 setting up 35-37 puppet apply command about 9, 31 output, interpreting of 11, 12 PuppetBoard 186 Puppet Dashboard 186 Puppet Explorer 186 Puppet Labs URL 8 URL, for advanced approaches 43 URL, for core resource types 61 URL, for style guide 52 URL, for system installation information 32 URL, for Troubleshooting section 47 puppetlabs-strings module URL 99 Puppet master about 31 configuration settings, inspecting 35 master machine, setting up 32 master manifest, creating 33, 34 tasks 32 puppetmaster system service 33 puppet module install command 101 Puppet support, for SSL CSR attributes URL 199 [ 210 ] Puppet, taking to cloud about 184 agents, initializing 185 cloud-provisioner module, using 186 Puppet toolchain 46 rspec-puppet module about 105 URL 105 R separate data storage need for 158 singletons 135 site manifest 33 SSL troubleshooting 47, 48 stdlib module 101 strings 15 subscribe metaparameter 23 successful provisioning, ensuring about 202 manifests, testing 204, 205 necessary relationships, adding 203 systems summarizing, with Facter 50, 51 S realize function 138, 139 redundancy saving, resource defaults used 152, 153 relationships, containers performance implications 89 require metaparameter 19 resource chaining 17 resource defaults used, for saving redundancy 152, 153 resource interaction implementing 22-24 resource parameters overriding 147, 148 resources about 10 converting, to data 172-174 exporting 142 exporting, to agents 141 importing 142 realizing, collectors used 140, 141 resources, exporting about 141 central firewall, maintaining 146 custom configuration, automating 144 hosts files, managing 144 master configuration, for storing exported resources 142 Nagios configuration, simplifying 145, 146 SSH host keys, exporting 143 resource type life cycle, agent side 58, 59 resource types cron 29 examining 25, 26 exec 27, 28 group 26 mount 29, 30 user 26 revocation 39 Roles and Profiles pattern 183 T templates performance bottlenecks, avoiding from 136 using 135, 136 template syntax learning 134, 135 transaction 57 Trusted Facts 189 types about 117 summarizing 61 type system 57 typical scopes, Puppet about 182 profiles 183, 184 roles 183, 184 U user resource type 26 utilities, custom module complexity, dealing 115, 116 configuration items, adding 111, 112 creating, for derived manifests 110 [ 211 ] customization, allowing 113 unwanted configuration items, removing 114, 115 W Warning keyword 20 V Y Vagrant 182 variables using 14 variable types about 14 arrays 15 hashes 14 strings 15 virtual resources creating 137, 138 yum command 8 [ 212 ] Thank you for buying Puppet Essentials About Packt Publishing Packt, pronounced 'packed', published its first book "Mastering phpMyAdmin for Effective MySQL Management" in April 2004 and subsequently continued to specialize in publishing highly focused books on specific technologies and solutions.


pages: 56 words: 16,788

The New Kingmakers by Stephen O'Grady


Amazon Web Services, barriers to entry, cloud computing, correlation does not imply causation, crowdsourcing, DevOps, Jeff Bezos, Khan Academy, Kickstarter, Mark Zuckerberg, Netflix Prize, Paul Graham, Silicon Valley, Skype, software as a service, software is eating the world, Steve Ballmer, Steve Jobs, Tim Cook: Apple, Y Combinator

Internally, Netflix oriented its business around its developers. As cloud architect Adrian Cockcroft put it: The typical environment you have for developers is this image that they can write code that works on a perfect machine that will always work, and operations will figure out how to create this perfect machine for them. That’s the traditional dev-ops, developer versus operations contract. But then of course machines aren’t perfect and code isn’t perfect, so everything breaks and everyone complains to each other. So we got rid of the operations piece of that and just have the developers, so you can’t depend on everybody and you have to assume that all the other developers are writing broken code that isn’t properly deployed.


pages: 234 words: 57,267

Python Network Programming Cookbook by M. Omar Faruque Sarker


business intelligence, cloud computing, Debian, DevOps, Firefox, inflight wifi, RFID, web application

Faruque Sarker Reviewers Ahmed Soliman Farghal Vishrut Mehta Tom Stephens Deepak Thukral Acquisition Editors Aarthi Kumarswamy Owen Roberts Content Development Editor Arun Nadar Technical Editors Manan Badani Shashank Desai Copy Editors Janbal Dharmaraj Deepa Nambiar Karuna Narayanan Project Coordinator Sanchita Mandal Proofreaders Faye Coulman Paul Hindle Joanna McMahon Indexer Mehreen Deshmukh Production Coordinator Nilesh R. Mohite Cover Work Nilesh R. Mohite About the Author Dr. M. O. Faruque Sarker is a software architect, and DevOps engineer who's currently working at University College London (UCL), United Kingdom. In recent years, he has been leading a number of Python software development projects, including the implementation of an interactive web-based scientific computing framework using the IPython Notebook service at UCL.


pages: 924 words: 241,081

The Art of Community by Jono Bacon


barriers to entry, collaborative editing, crowdsourcing, Debian, DevOps,, Firefox, game design, Johann Wolfgang von Goethe, Jono Bacon, Kickstarter, Mark Zuckerberg, openstreetmap, Richard Stallman, side project, Silicon Valley, Skype, slashdot, social graph, software as a service, telemarketer, union organizing, VA Linux, web application

As such, we needed to pick which events we wanted him to attend, and pick wisely. With this in mind I asked Jorge to put together a spreadsheet that listed all the events that could be interesting for us to attend. The focus of this list was clear: these need to be cloud events and oriented around technology (as opposed to business events) and DevOps (the audience we were focusing on). I asked Jorge to gather this list of events and to determine the following characteristics for each one: Location and venue Date(s) of the event Typical attendance size Number of sessions and average talk audience size Team priority Each of these pieces of information helped to provide an overview of each event and its respective details.