Codemash 2018 reviews

I’m writing my thoughts here about the sessions I attended at CodeMash 2018. Overall, I had a great time and learned some new and important things about the latest technologies. This year I tried to attend more data science / machine learning talks, though several of them were cancelled at the last minute. (Perhaps the bad weather was to blame.)

I wrote these notes during or immediately after each talk, and I submitted some of the text to the session survey part of the Attendee Hub app on my phone. I think it’s a good idea to send the feedback to the speakers, but I don’t think it is used by reviewers for next year, so I’m not sure it matters very much. Regardless, I write to consolidate my own learning, if for no other reason.

Fast Neural Networks – a no brainer

Speakers: Riccardo Terrell

The speaker used the agent model because ANNs are embarassingly parallel. Map agents to nodes – 1:1, send updates between them (values forward, backpropagation corrections backward). Make agents reactive to messages (updates from dependent nodes). The speaker has an e-book on MEAP about it – Parallel NN as map reduce problem – simpler way to implement NN.

The speaker tried to cover too much content. Also, I didn’t hear any good outcomes or reasons for why he reimplemented backpropagation (a very old algorithm). What was learned by redoing it with actors?

Getting Started with Deep Learning

Speakers: Seth Juarez

confusion matrix- be careful of TN vs FP box location. Works for MS; uses VS to edit Python; tensorflow implementation in python for MNIST. Check channel 9 msdn for newer presentation

cool twilio use – text the presenter to ask questions during presentation

This speaker was great because he told lots of jokes to keep the talk interesting despite the underlying math. I really liked the twilio based questions via text messages!

Imposter Syndrome: Overcoming Self-Doubt in Success

Speakers: Heather Downing

The speaker did a very good job motivating the subject, but I would’ve liked to hear more about practical things that can be done to deal with imposter syndrome. I agree with “co-bragging”, but I think other good options include public speaking to build confidence or doing volunteer work to appreciate how it’s a 1st world problem.

cycle of failure: overly confident – procrastinate and rely on luck. under confident – put in excessive time and effort. Discount success and undermine ego.

How can you tell if someone will eventually succeed after a failure? You are not the failure; own your mistakes and learn from them. Don’t be scared of failure; be scared of not finding the truth.

“Co-bragging” – praise your co-workers’ acheivements, and they praise you. Create a positive culture that avoids hurtful comparisons.

Fake it until you become it, then pay it forward.

Machine Learning at Scale, How to Keep Learning as Your Data Keeps Increasing

Speakers: Matt Winkler

This was exactly why I come to codemash – I want to hear about the latest tech so that I can keep up with the industry standards. I thought the speaker did an excellent job of reviewing the latest ML implementations and describing how to deploy them at scale. I loved all the detailed examples!

data prep: spark, pandas, dplyR
scale up: spark cluster, HDInsight
aggregation: AML workbench

Azure Machine Learning workbench can automatically learn how to create columns by example (formatting and aggregating a date time column). (automatically generates python code!)

Nvidia’s latest GPU was announced at NIPS conference (academic AI researchers).

VM recommendations: use version control so that you can migrate easily. make scripts for any required setup. track your outcomes from experimenting with different models. benchmark the price effectiveness of different configurations.

home camera – Amazon deep lens – recognizes faces

MSSQL 2017 has integrated ML algorithms INSIDE it! see tutorials?!

Walking the High Wire: Patching Erlang Live

Speakers: John Daily

The speaker did a great job of motivating the use of Erlang for live patches, which is exactly what I wanted to hear about. But I would’ve liked to have seen a non-trivial example, or some more information about how it’s used in the real world.

Power isn’t pretty – erlang is designed for fault tolerance, not usability. Its network IO is fundamentally async but reliable, unlike RPC or CORBRA.

App architecture without RDBS vs NoSQL drama

Speakers: Jeff Putz

The speaker has a lot of good experience to share. He’s obviously worked in diverse applications, and I appreciate hearing about tradeoffs between technologies instead of just advertising the latest tech. I think it’s really important to repeat the message about not being obsessed with the latest tech just because it’s new. I wish the presentation didn’t get sidetracked by arguments about issues of personal preference in DB design.

NoSQL advantages – less CPU, high write thruput, maybe higher dev productivity

Fight the urge to normalize everything. Don’t make complicated schema for queries that will never be used. Focus on the problem domain, not the persistence and code style. Running multiple queries can be OK (as compared to a join).

SQL can do key value pairs OK. The death of SQL in 2010 was greatly exaggerated.

Aggregate queries with joins cause lots of database work. Avoid redoing them in real time. Use the client layer to maintain a cached state of frequently queried aggregates; don’t be afraid to store redundant data because it’s so cheap now. SQL was originally designed to minimize storage at the expense of CPU (i.e. normalization).

Scala for the Java Developer: It’s Easier Than You Think

Speakers: Justin Pihony

I liked this talk because I want to learn at least one new programming paradigm every time I come to CodeMash, and the speaker did a great job of being an ambassador for Scala. I appreciated hearing about the limitations and realistic expectations for the language. Things to improve: show applications and companies using it.

Scala runs on JVM; compatible with java, but it’s functional and immutable first. Less verbose; fixes many annoyances with Java’s legacy conventions. Includes REPL.

Ride the rails: Handling errors the functional way

Speakers: Sam Hanes

The speaker did a very good talk on functional programming basics in F#. I enjoy sessions like these for reminding me that there are alternatives to traditional imperative paradigms. Suggestion: use font colors with better contrast (dark red text on black backgrounds is difficult to read). Overall very good talk.

Functional programming – avoid mutable state.

Use bind to connect a switch function to two track handling. Exceptions can be converted into failures (if they are predictable enough to catch).

A Game of Theories: why languages do what they do

Speakers: Rae Krantz

The speaker did a fun talk about comparing programming languages and GoT. This type of talk is nice to have as a break in between “serious business” and learning new tech. Suggestions: use the full time slot. Show the same algorithm implemented in different languages. Talk more about the supporting libraries (not just the language itself).

Ruby, python, go, erlang, clojure, JavaScript – not sure why these 6 as opposed to any other survey of languages. Popularity?

(I went to this talk because the session “The Polyglot Data Scientist – Adventures with R, Python, and SQL” by Sarah Dutkiewicz was cancelled. It sounded like the snow storm scared a lot of people into leaving CodeMash early, and some other speakers cancelled their talks for being sick.)

Image Recognition with Convolutional Neural Networks using Keras and CoreML

Speakers: Tim LeMaster

This talk was too introductory, barely covering any applications and only talking about history. I walked out after 10 minutes to go eat lunch and do professional networking.

How to Count Your Chickens After They’ve Hatched

Speakers: Gary Short

The speaker is very entertaining and amusing, and it’s great to see a fun talk about a relevant topic (ML). Images are easy to relate to, and it’d be cool to see more talks with them. I don’t think C# was the right choice for this algorithm – python sklearn has built-in implementations for this problem.

Counting chickens in brightfield images – threshold grayscale image, then try k-NN. k-NN doesn’t work because k is unknown? The algorithm is pretty ad hoc, but more power to him if it works. 85% accurate but told to deploy it by customer farmer anyway.

[Sponsor Session] Attracting and Retaining Top Technical Talent (a.k.a. “Insomnia Cure #1 for Software Development Leaders”)

Speakers: Stout Systems

This talk was pretty good, and it make me feel better about my chances of getting a different job someday. On a daily basis, I really have no idea how good the market is for programmers, but I still remember the 2004 IT recession.

top 5 for recruiting/retaining talent –

5. salary (and PTO, retirement, health care, bonus, stock, options – cash equivalents)

4. workplace culture (office space features, remote work, flex schedule, work/life balance)

3. career growth (holding same job for years is boring and a career killer; upskilling and training is good. )

2. [lack of] process (shifting requirements, changing priorities, inconsistent deadlines, no deployments) – also leadership issues (excessive meetings, no clear decisions).

1. technology stack (fear of extinction; huge, messy codebase) – transform or evolve, within appropriate constraints (relevant tech, reasonable schedules). automate mundane tasks (build, deploy). allow some freedom of tools (svn vs git, OS, editors).

R Performance (It’s not R, it’s You)

Speakers: Tim Hoolihan

The speaker gave a pretty good overview of performance issues in R. I’m not really an R user, so I attended this talk just to see if it was much better or different than python based ML. My conclusion: no because R uses the python ML libraries!

How Not to Destroy Data

Speakers: Michael Perry

I liked how this talk summarized an academic topic in a relatable way. Despite being in the last time slot, I learned some fascinating ideas about Historical Modelling. I like having some more challenging topics to attend.

Audit log problems – not reliable or type safe. Large. Simple

Event sourcing. Derive object state by reading whole table of changes. Order is significant.

Historical modelling. Partially ordered. Not simple, but better??

* Every field is immutable
* Surrogate key is only used internally (not in API)
* everything else is the identity in API

Use timestamps as “uniquifiers”

* A fact cannot be deleted
* query uses WHERE NOT EXISTS subclause

speaker name precedessor record – a Name is identified by its nameId and all predecessor records. This is git-style version control inside a db!

Mutable properties can’t be part of entity. detect and resolve conflicts via knowing predecessors.

Events move entities forward thru workflow, pointing backward to previous event.

Advantages: no locks. Offline data is OK. Cluster synch is easier (eventual consistency via Active-Active clusters). Microservice as historical db.