Xiaoye Sherry Li

Xiaoye Sherry Li

Research interests:

 Design and optimization of algorithms on parallel machines 
 High performance algebraic solvers 
 Sparse matrix computations 
 Mathematical software

I do research and provide support for mathematical software on the parallel machines at NERSC. Currently, I am heavily involved in the DOE SciDACprojects. Previously I worked on NSFNPACI project. 
I received Ph.D. in Computer Sciencefrom UC Berkeley.

Li’s Google Scholar Citations

Selected papers:( Full list, over 90 papers)

 Sparse matrix computations 
 Floating-point, high-precision arithmetic 
 Performance evaluation 
 Numerical optimization 
 Applications 

Software:

 SuperLU — Sequential and parallel libraries to solve unsymmetric sparse linear systems using LU factorization. 
 PDSLin — Parallel Domain-decomposition, Schur complement based hybrid sparse linear solver. 
 XBLAS — A reference implementation for the Extended and Mixed precision BLAS standard. 
 ARPREC — A C++/F90 package for performing arbitrary precision arithmetic. 
 CLAPACK — Full set of LAPACK in C. 
 ieee_except — A set of condition estimation routines, which are faster than those in LAPACK, by utilizing floating-point exception handling. It contains routines to manipulate IEEE exception sticky flags.

Selected Honors:

SIAM Fellow, Society for Industrial and Applied Mathematics, 2016Elected to the SIAM Council, Jan. 2016 – Dec. 2018Invited Plenary Speaker, SIAM Conference on Applied Linear Algebra, October 26-30, 2015Invited Lecturer, the Fourth Gene Golub SIAM Summer School, July 22 – August 2, 2013Invited Plenary Speaker, SIAM Annual Meeting, July 12-16, 2010

Selected presentations:

Invited Plenary SpeakerAccelerating Direct Linear Solvers with Algorithmic and Hardware Advances, SIAM Conf. on Applied Linear Algebra, Oct. 26-30, 2015, Atlanta. ( View presentation
Invited Lecturer, short course onFactorization-based sparse solvers and preconditioners, 4th Gene Golub SIAM Summer School, July 22-Aug. 9, 2013, Shanghai. (Videos and course materials,  Extended Summary (in book “Matrix Functions and Matrix Equations”, Z. Bai, W. Gao, Y. Su, editors, Series in Contemporary Applied Mathematics, World Scientific Publisher, Oct. 2015, pp. 109-137) 
Towards an optimal-order approximate sparse factorization exploiting data-sparseness in separatorsInvited Speaker, Workshop Celebrating 40 Years of Nested Dissection, July 22-23, 2013, Waterloo. 
Invited Plenary SpeakerFactorization-based sparse solvers and preconditioners, SIAM Annual Meeting, July 12-16, 2010. (View presentation
Towards an Optimal Parallel Approximate Factorization Using HSS Structures, Householder Symposium XVIII, June 13-17, 2011, Tahoe City, California.
A Supernodal Approach for ILU with Partial Pivoting, Sparse Days 2010 (Invited), CERFACS, June 15-17, 2010
Sparse matrix methods on high performance computers, Invited Lecture, CS267/Eng233, UCB, March 16, 2010.
Use of Semi-separable Approximate Factorization and Direction-preserving for Constructing Effective Preconditioners, SIAM Conference on Applied Linear Algebra, Monterey Bay-Seaside, Oct. 26-29, 2009. 
Performance Modeling Tools for Parallel Sparse Linear Algebra Computations, ParCo 2009, September 1-4, 2009, ENS-Lyon, France. 
Scalability Issues in Sparse Factorization and Triangular Solution, Sparse Days, June 23-24, 2008, CERFACS, Toulouse, France. 
Evaluation of sparse LU factorization and triangular solution on multicore architectures, VECPAR 2008, June 24-27, 2008, Toulouse, France. 
Algebraic Sub-structuring for Large-scale Electromagnetic Application, 16th International Conference on Domain Decomposition Methods, January 12-15, 2005, Courant Institute, New York University. 
A Comparison of Three High-Precision Quadrature Schemes, Experimental Math Workshop, March 29-30, 2004. Oakland, CA. 
Fill Reduction Algorithm Using Diagonal Markowitz Scheme with Local Symmetrization, SIAM Conference on Computational Science and Engineering, February 10-13, 2003, San Diego. 

Selected professional services:

 Associate Editor, ACM Trans. Math. Software (2006-present) 
 Associate Editor, SIAM J. Scientific Computing (2006-2009) 

Zach Holman

Zach Holman

I’m a developer and startup advisor living in San Francisco. I like it here because it’s usually pretty warm but not too warm where I worry about wearing shorts and junk.

You can find me on Twitter, on GitHub, on AngelList, and on Instagram.

I’m an advisor with Dockbit, a neat way to deploy software.

I joined GitHub in 2010 as one of their first engineering hires, and helped build and grow their product and culture over five years, from nine employees all the way to 250.

Before that, I worked at Gild for a few years after graduating from Carnegie Mellon University in Pittsburgh. Go Stillers.

I grew up in Fargo, North Dakota, which really is lovely, even though there’s a really good chance you’ll freeze to death there.

Want to chat? Feel free to email me, or if you have a question that other people might be interested in as well, open a GitHub issue on holman/ama so that others can read it, too.

Amsterdam

I like building things… so much so that I was the top committer at GitHub for the last two years of my time there.

Here are a few of the things I’ve shipped over the years:

GitHub: Issues (2014)
GitHub Issues

Issues is GitHub’s most popular feature, but it hadn’t been touched for years and years. I laid out the direction, started working on a replacement, and led the team that shipped GitHub’s third iteration of Issues.
speaking.io (2014)
speaking.io

I’ve been fortunate to share my experiences with thousands of people at conferences and meetups through lots of talks I’ve given over the years. speaking.io is a place where I can share what I’ve learned about public speaking.
GitHub: Conversation Locking (2014)
Issue and thread locking

Dealing with problems in any community is tricky, and the programming community is unfortunately no different. I really wanted to give community leaders and project owners tools to help create a safe environment, and the ability to lock threads plays a big role with that.
GitHub: Web Flow (2012-2014)
GitHub Flow for the web

I love Git, but it’s pretty damn complicated sometimes. A big focus I had at GitHub was to take things from the command line and make them easier on the web. I shipped branch creation and helped out with other features, like file creation and populating a license.
GitHub: Jobs (2012-2013)
GitHub Jobs

I worked on GitHub Jobs for a few years between projects, mostly dealing with backend rewrites and refactoring. It was a fun little project to hack on from time to time.
GitHub: Various Talks (2011-2015)
Move Fast and Break Nothing talk

Even though I was focused on product development, I kind of fell into a public speaking role while at GitHub, too. Some of my more popular talks include How GitHub Uses GitHub to Build GitHub, Git and GitHub Secrets, and Move Fast and Break Nothing.
GitHub: Enterprise (2011)
GitHub Enterprise

After working on GitHub Firewall Install for a few years, I helped kickoff the transition of taking everything we had learned and rebuilding our enterprise offering. We also gave it a name that makes a lot more sense.
Good-Tutorials (2002-2012)
Good-Tutorials

I founded and ran Good-Tutorials for a decade. It was the largest Photoshop tutorial site for quite some time before expanding into other software and languages. It was acquired in 2012.

medium.com: Being A Developer After 40

medium.com: Being A Developer After 40

Hi everyone, I am a forty-two years old self-taught developer, and this is my story.

A couple of weeks ago I came by the tweet below, and it made me think about my career, and those thoughts brought me back to where it all began for me:

I started my career as a software developer at precisely 10am, on Monday October 6th, 1997, somewhere in the city of Olivos, just north of Buenos Aires, Argentina. The moment was Unix Epoch 876142800. I had recently celebrated my 24th birthday.
The World In 1997

The world was a slightly different place back then.

Websites did not have cookie warnings. The future of the web were portals like Excite.com. AltaVista was my preferred search engine. My e-mail was kosmacze@sc2a.unige.ch, which meant that my first personal website was located in http://sc2a.unige.ch/~kosmacze. We were still mourning Princess Lady Diana. Steve Jobs had taken the role of CEO and convinced Microsoft to inject 150 million dollars into Apple Computer. Digital Equipment Corporation was suing Dell. The remains of Che Guevara had just been brought back to Cuba. The fourth season of “Friends” had just started. Gianni Versace had just been murdered in front of his house. Mother Teresa, Roy Lichtenstein and Jeanne Calment (the world’s oldest person ever) had just passed away. People were playing Final Fantasy 7 on their PlayStation like crazy. BBC 2 started broadcasting the Teletubbies. James Cameron was about to release Titanic. The Verve had just released their hit “Bitter Sweet Symphony” and then had to pay most royalties to the Rolling Stones.
Excite in 1997, courtesy of the Internet Archive

Smartphones looked like the Nokia 9000 Communicator; they had 8 MB of memory, a 24 MHz i386 CPU and run the GEOS operating system.

Smartwatches looked like the CASIO G-SHOCK DW-9100BJ. Not as many apps but the battery life was much longer.

IBM Deep Blue had defeated for the first time Garry Kasparov in a game of chess.

A hacker known as “_eci” published the C code for a Windows 3.1, 95 and NT exploit called “WinNuke,” a denial-of-service attack that on TCP port 139 (NetBIOS) causing a Blue Screen of Death.

Incidentally, 1997 is also the year Malala Yousafzai, Chloë Grace Moretz and Kylie Jenner were born.

Many film storylines take place in 1997, to name a few: Escape from New York, Predator 2, The Curious Case of Benjamin Button, Harry Potter and the Half-Blood Prince, The Godfather III and according to Terminator 2: Judgement Day, Skynet became self-aware at 2:14 am on August 29, 1997. That did not happen; however, in an interesting turn of events, the domain google.com had been registered on September 15th that year.

We were two years away from Y2K and the media were starting to get people nervous about it.
My First Developer Job

My first job consisted of writing ASP pages in various editors, ranging from Microsoft FrontPage, to HotMeTaL Pro to EditPlus, managing cross-browser compatibility between Netscape Navigator and Internet Explorer 4, and writing stored procedures in SQL Server 6.5 powering a commercial website published in Japanese, Russian, English and Spanish — without any consistent UTF-8 support across the software stack.

The product of these efforts ran in a Pentium II server hosted somewhere in the USA, with a stunning 2 GB hard disk drive and a whooping 256 MB of RAM. It was a single server running Windows NT 4, SQL Server 6.5 and IIS 2.0, serving around ten thousand visitors per day.

My first professional programming language was this mutant called VBScript, and of course a little bit of JavaScript on the client side, sprinkled with lots of “if this is Netscape do this, else do that” because back then I had no idea how to use JavaScript properly.

Interestingly, it’s 2016 and we are barely starting to understand how to do anything in JavaScript.

Unit tests were unheard of. The Agile Manifesto had not been written yet. Continuous integration was a dream. XML was not even a buzzword. Our QA strategy consisted of restarting the server once a week, because otherwise it would crash randomly. We developed our own COM+ component in Visual J++ to parse JPEG files uploaded to the server. As soon as JPEG 2000-encoded files started popping up, our component broke miserably.

We did not use source control, not even CVS, RCS or, God forbid, SourceSafe. Subversion did not exist yet. Our Joel Test score was minus 25.
6776 Days

For the past 6776 days I have had a cup of coffee in the morning and wrote code with things named VBScript, JavaScript, Linux, SQL, HTML, Makefiles, Node.js, CSS, XML, .NET, YAML, Podfiles, JSON, Markdown, PHP, Windows, Doxygen, C#, Visual Basic, Visual Basic.NET, Java, Socket.io, Ruby, unit tests, Python, shell scripts, C++, Objective-C, batch files, and lately Swift.

In those 6776 days lots of things happened; most importantly, my wife and I got married. I quit 6 jobs and I was fired twice. I started and closed my own business. I finished my Master Degree. I published a few open source projects, and one of them landed me an article on Ars Technica by Erica Sadun herself. I was featured in Swiss and Bolivian TV shows. I watched live keynotes by Bill Gates and by Steve Jobs in Seattle and San Francisco. I spoke at and co-organised conferences in four continents. I wrote and published two books. I burned out twice (not the books, myself,) and lots of other things happened, both wonderful and horrible.

I have often pondered about leaving the profession altogether. But somehow, code always calls me back after a while. I like to write apps, systems, software. To avoid burning out, I have had to develop strategies.

In this talk I will give you my secrets, so that you too can reach the glorious age of 40 as an experienced developer, willing to continue in this profession.
Advice For The Young At Heart

Some simple tips to reach the glorious age of 40 as a happy software developer.
1. Forget The Hype

The first advice I can give you all is, do not pay attention to hype. Every year there is a new programming language, framework, library, pattern, component architecture or paradigm that takes the blogosphere by storm. People get crazy about it. Conferences are given. Books are written. Gartner hype cycles rise and fall. Consultants charge insane amounts of money to teach, deploy or otherwise fuckup the lives of people in this industry. The press will support these horrors and will make you feel guilty if you do not pay attention to them.

In 1997 it was CORBA & RUP.

In 2000 it was SOAP & XML.

In 2003 it was Model Driven Architecture and Software Factories.

In 2006 it was Semantic Web and OLPC.

In 2009 it was Augmented Reality.

In 2012 it was Big Data.

In 2015… Virtual Reality? Bots?

Do not worry about hype. Keep doing your thing, keep learning what you were learning, and move on. Pay attention to it only if you have a genuine interest, or if you feel that it could bring you some benefit in the medium or long run.

The reason for this lies in the fact that, as the Romans said in the past, Nil nove sul sole. Most of what you see and learn in computer science has been around for decades, and this fact is purposedly hidden beneath piles of marketing, books, blog posts and questions on Stack Overflow. Every new architecture is just a reimagination and a readaptation of an idea that was floating around for decades.
2. Choose Your Galaxy Wisely

In our industry, every technology generates what I call a “galaxy.” These galaxies feature stars but also black holes; meteoric changes that fade in the night, many planets, only a tiny fraction of which harbour some kind of life, and lots of cosmic dust and dark matter.

Examples of galaxies are, for example, .NET, Cocoa, Node.js, PHP, Emacs, SAP, etc. Each of these features evangelists, developers, bloggers, podcasts, conferences, books, training courses, consulting services, and inclusion problems. Galaxies are built on the assumption that their underlying technology is the answer to all problems. Each galaxy, thus, is based in a wrong assumption.

The developers from those different galaxies embody the prototypical attitudes that have brought that technology to life. They adhere to the ideas, and will enthusiatically wear the t-shirts and evangelize others about the merits of their choice.

Actually, I use the term “galaxy” to avoid the slightly more appropriate if not less controversial term “religion,” which might describe this phenomenon better.

In my personal case, I spent the first ten years of my career in the Microsoft galaxy, and the following nine in the Apple galaxy.

I dare say, one of the biggest reasons why I changed galaxies was Steve Ballmer. I got tired of the general attitude of the Microsoft galaxy people against open source software.

On the other hand, I also have to say that the Apple galaxy is a delightful place, full of artists and musicians and writers who, by chance or ill luck, happen to write software as well.

I attended conferences in the Microsoft galaxy, like the Barcelona TechEd 2003, or various Tech Talks in Buenos Aires, Geneva or London. I even spoke at the Microsoft DevDays 2006 in Geneva. The general attitude of developers in the Microsoft galaxy is unfriendly, “corporate” and bound in secrecy, NDAs and cumbersome IT processes.

The Apple galaxy was to me, back in 2006, exactly the opposite; it was full of people who were musicians, artists, painters; they would write software to support their passion, and they would write software with passion. It made all the difference, and to this day, I still enjoy tremendously this galaxy, the one we are in, right now, and that has brought us all together.

And then the iPhone came out, and the rest is history.

So my recommendation to you is: choose your galaxy wisely, enjoy it as much or as little as you want, but keep your telescope pointed towards the other galaxies, and prepare to make a hyperjump to other places if needed.
3. Learn About Software History

This takes me to the next point: learn how your favorite technology came to be. Do you like C#? Do you know who created it? How did the .NET project came to be? Who was the lead architect? Which were the constraints of the project and why did the language turned out to be what it is now?

Apply the same recipe to any language or CPU architecture that you enjoy or love: Python, Ruby, Java, whatever the programming language; learn their origins, how they came up to be. The same for operating systems, networking technologies, hardware, anything. Go and learn how people came up with those ideas, and how long they took to grow and mature. Because good software takes ten years, you know.

The stories surrounding the genesis of our industry are fascinating, and will show you two things: first, that everything is a remix. Second, that you could be the one remixing the next big thing. No, scratch that: you are going to be the creators of the next big thing.

And to help you get there, here is my (highly biased) selection of history books that I like and recommend:

Dealers of Lightning by Michael A. Hiltzik
Revolution in the Valley by Andy Hertzfeld
The Cathedral and the Bazaar by Eric S. Raymond
The Success of Open Source by Steven Weber
The Old New Thing by Raymond Chen
The Mythical Man Month by Frederick P. Brooks Jr.

You will also learn to value those things that stood the test of time: Lisp, TeX, Unix, bash, C, Cocoa, Emacs, Vim, Python, ARM, GNU make, man pages. These are some examples of long-lasting useful things that are something to celebrate, cherish and learn from.
4. Keep on Learning

Learn. Anything will do. Wanna learn Fortran? Go for it. Find Erlang interesting? Excellent. Think COBOL might be the next big thing in your career? Fantastic. Need to know more about Functional Reactive Programming? Be my guest. Design? Of course. UX? You must. Poetry? You should.

Many common concepts in Computer Science have been around for decades, which makes it worthwhile to learn old programming languages and frameworks; even “arcane” ones. First, it will make you appreciate the current state of the industry (or hate it, it depends,) and second, you will learn how to use the current tools more effectively — if anything, because you will understand its legacy and origins.

Tip 1: learn at least one new programming language every year. I did not come up with this idea; The Pragmatic Programmer book did. And it works.

One new programming language every year. Simple, huh? Go beyond the typical “Hello, World” stage, and build something useful with it. I usually build a simple calculator with whatever new technology I learn. It helps me figure out the syntax, it makes me familiar with the APIs or the IDE, etc.

Tip 2: read at least 6 books per year. I have shown above a list of six must-read books; that should keep you busy for a year. Here goes the list for the second year:

Peopleware by Tom DeMarco and Tim Lister
The Psychology of Software Programming by Gerald M. Weinberg
Facts and Fallacies of Software Engineering by Robert L. Glass
The Design of Everyday Things by Don Norman
Agile!: The Good, the Hype and the Ugly by Bertrand Meyer
Rework by Jason Fried and David Heinemeier Hansson
Geekonomics by David Rice

(OK, those are seven books.)

Six books per year looks like a lot, but it only means one every 2 months. And most of the books I have mentioned in this presentation are not that long, and even better, they are outstandingly well written, they are fun and are full of insight.

Look at it this way: if you are now 20 years old, by the age of 30 you will have read over 60 books, and over 120 when you reach my age. And you will have played with at least 20 different programming languages. Think about it for a second.

Some of the twelve books I’ve selected for you have been written in the seventies, others in the eighties, some in the nineties and finally most of them are from the past decade. They represent the best writing I have come across in our industry.

But do not just read them; take notes. Bookmark. Write on the pages of the books. Then re-read them every so often. Borges used to say that a bigger pleasure than reading a book is re-reading it. And also, please, buy those books you really like in paper format. Believe me. eBooks are overrated. Nothing beats the real thing.

Of course, please know that as you will grow old, the number of things that qualify as new and/or important will drop dramatically. Prepare for this. It is OK to weep silently when you realise this.
5. Teach

Once you have learnt, teach. This is very important.

This does not mean that you should setup a classroom and invite people to hear your ramblings (although it would be awesome if you did!) It might mean that you give meaningful answers to questions in Stack Overflow; that you write a book; that you publish a podcast about your favorite technology; that you keep a blog; that you write on Medium; that you go to another continent and set up programming schools using Raspberry Pis; or that you help a younger developer by becoming their mentor (do not do this before the age of 30, though.)

Teaching will make you more humble, because it will painfully show you how limited your knowledge is. Teaching is the best way to learn. Only by testing your knowledge against others are you going to learn properly. This will also make you more respectful regarding other developers and other technologies; every language, no matter how humble or arcane, has its place within the Tao of Programming, and only through teaching will you be able to feel it.

And through teaching you can really, really make a difference in this world. Back in 2012 I received a mail from a person who had attended one of my trainings. She used to work as an Adobe Flash developer. Remember ActionScript and all that? Well, unsurprisingly after 12 years of working as a freelance Flash developer she suddenly found herself unemployed. Alone. With a baby to feed. She told me in her message that she had attended my training, that she had enjoyed it and also learnt something useful, and that after that she had found a job as a mobile web developer. She wrote to me to say thank you.

I cannot claim that I changed the world, but I might have nudged it a little bit, into something (hopefully) better. This thought has made every lesson I gave since then much more worthwhile and meaningful.
6. Workplaces Suck

Do not expect software corporations to offer any kind of career path. They might do this in the US, but I have never seen any of that in Europe. This means that you are solely responsible for the success of your career. Nobody will tell you “oh, well, next year you can grow to be team leader, then manager, then CTO…”

Not. At. All. Quite the opposite, actually: you were, are and will be a software developer, that is, a relatively expensive factory worker, whose tasks your managers would be happy to offshore no matter what they tell you.

Do not take a job just for the money. Software companies have become sweatshops where you are supposed to justify your absurdly high salary with insane hours and unreasonable expectations. And, at least in the case of Switzerland, there is no worker union to help you if things go bad. Actually there are worker unions in Switzerland, but they do not really care about situations that will not land them some kind of media exposure.

Even worse; in most workplaces you will be harassed, particularly if you are a woman, a member of the LGBT community or from a non-caucasian ethnic group. I have seen developers threatened to have their work visas not renewed if they did not work faster. I have witnessed harassment of women and gay colleagues.

Some parts of our industry are downright disgusting, and you do not need to be in Silicon Valley to live it. You do not need Medium to read it. You could experience that right here in Switzerland. Many banks have atrocious workplaces. Financial institutions want you to vomit code 15 hours a day, even if the Swiss working laws explicitly forbid such treatments. Pharmaceutical companies want you to write code to cheat test results and to help them bypass regulations. Startups want your skin, working for 18 hours for no compensation, telling you bullshit like “because we give you stock options” or “because we are all team players.”

It does not matter that you are Zach Holman and that you can claim in your CV that you literally wrote Github from scratch: you will be fired for the pettiest of reasons.

It does not matter that the app brings more than half of your employer traffic and revenues; the API team will treat you and your ideas with contempt and sloppiness.

I have been asked to work for free by very well known people in the industry, some of them even featured in Wikipedia, and it is simply appalling. I will not give out their names, but I will prevent any junior from getting close to them, because people working without ethics do not deserve anyone’s brain.

Whenever an HR manager tells you “you must do this (whatever wrong thing in your frame of reference) because we pay you a salary,” remember to answer the following: “you pay me a salary, but I give you my brain in exchange, and I refuse to comply with this order.”

And to top it all, they will put you in an open space, and for some reason they will be proud about it. Open spaces are a cancer. They are without a doubt the worst possible office layout ever invented, and the least appropriate for software development — or any type of brain work for that matter.

Remember this: the fact that you understand something does not imply that you have to agree to it.

Disobey authority. Say “fuck you, I won’t do what you tell me” and change jobs. There are fantastic workplaces out there; not a lot, but they exist. I have been lucky enough to work in some of them. Do not let a bad job kill your enthusiasm. It is not worth it. Disobey and move on.

Or, better yet, become independent.
7. Know Your Worth

You have probably heard about the “10x Software Engineer” myth, right? Well here is the thing: it is not a myth, but it does not work they way you think it works.

It works, however, from the point of view of the employer: a “10x Software Engineer” generates worth 10 times whatever the employer pays. That means that you she or he gets 100 KCHF per year, but she or he are actually creating a value worth over a million francs. And of course, they get the bonuses at the end of the fiscal year, because, you know, capitalism. Know your worth. Read Karl Marx and Thomas Piketty. Enough said.

Keep moving; be like the shark that keeps on swimming, because your skills are extremely valuable. Speak out your salary, say it loud, blog about it, so that your peers know how much their work is worth. Companies want you to shut up about that, so that women are paid 70% of what men are paid. So speak up! Blog about it! Tweet it! I am making 135 KCHF per year. That was my current salary. How about you? And you? The more we speak out, the less inequality there will be. Any person doing my job with my experience should get the same money, regardless of race, sex, age or preferred football team. End of the story. But it is not like that. It is not.
8. Send The Elevator Down

If you are a white male remember all the privilege you have enjoyed since birth just because you were born that way. It is your responsibility to change the industry and its bias towards more inclusion.

It is your duty to send the elevator down.

Take conscious decisions in your life. Be aware of your actions and their effect. Do not blush or be embarrased for changing your opinions. Say “I’m sorry” when required. Listen. Do not be a hotshot. Have integrity and self-respect.

Do not critisize or make fun of the technology choices of your peers; for other people will have their own reasons to choose them, and they must be respected. Be prepared to change your mind at any time through learning. One day you might like Windows. One day you might like Android. I am actually liking some parts of Android lately. And that is OK.

9. LLVM

Everybody is raving about Swift, but in reality what I pay more attention to these days is LLVM itself.

I think LLVM is the most important software project today, as measured in its long-term impact. Objective-C blocks, Rust & Swift (the two most loved strongly typed and compiled programming languages in the 2016 StackOverflow developer survey,) Dropbox Pyston, the Clang Static Analyser, ARC, Google Souper, Emscripten, LLVMSharp, Microsoft LLILC, Rubymotion, cheerp, watchOS apps, the Android NDK, Metal, all of these things were born out or powered by LLVM. There are compilers using LLVM as a backend for pretty much all the most important languages of today. The .NET CLR will eventually interoperate with it, and Mono already uses it. Facebook has tried to integrate LLVM with HHVM, and WebKit recently switched from LLVM to the new B3 JIT JavaScript compiler.

LLVM is cross-platform, cross-CPU-architecture, cross-language, cross-compiler, cross-eyed-tested, free as in gratis and free as a bird.

Learn all you can about LLVM. This is the galaxy where true innovation is happening now. This is the foundation for the next 20 years.
10. Follow Your Gut

I had the gut feeling .NET was going to be big when I watched its introduction in June 2000. I had the gut feeling the iPhone was going to be big when I watched its introduction in 2007.

In both cases people laughed at my face, literally. In both cases I followed my gut feeling and I guess things worked out well.

Follow your gut. You might be lucky, too.
11. APIs Are King

Great APIs enable great apps. If the API sucks, the app will suck, too, no matter how beautiful the design.

Remember that chunky is better than chatty, and that clients should be dumb; push as much logic as you can down to the API.

Do not invent your own security protocols.

Learn a couple of server-side technologies, and make sure Node is one of those.

Leave REST aside and embrace Socket.io, ZeroMQ, RabbitMQ, Erlang, XMPP; explore realtime as the next step in app development. Realtime is not only for chat apps. Remove polling from the equation forever.

Oh, and start building bots around those APIs. Just saying.
12. Fight Complexity

Simpler is better. Always. Remember the KISS principle. And I do not mean only at the UI level, but all the way until the deepest layers of your code.

Refactoring, unit tests, code reviews, pull requests, all of these tools are at your disposal to make sure that the code you ship is the simplest possible architecture that works. This is how you build resilient systems for the long term.
Conclusion

The most important thing to remember is that your age does not matter.

One of my sons said to me, “Impossible, Dad. Mathematicians do all their best work by the time they’re 40. And you’re over 80. It’s impossible for you to have a good idea now.”

If you’re still awake and alert mentally when you’re over 80, you’ve got the advantage that you’ve lived a long time and you’ve seen many things, and you get perspective. I’m 86 now, and it’s in the last few years that I’ve had these ideas. New ideas come along and you pick up bits here and there, and the time is ripe now, whereas it might not have been ripe five or 10 years ago.

Michael Atiyah, Fields Medal and Abel Prize winner Mathematician, quoted in a Wired article.

As long as your heart tells you to keep on coding and building new things, you will be young, forever.

In 2035, exactly 19 years from now, somebody will give a talk at a software conference similar to this one, starting like this:

“Hi, I am 42 years old, and this is my story.”

Hopefully one of you will be giving that presentation; otherwise, it will be an AI bot. You will provide some anecdotical facts about 2016, for example that it was the year when David Bowie, Umberto Eco, Gato Barbieri and Johan Cruyff passed away, or when SQL Server was made available in Linux, or when Google AlphaGo beat a Go champion, or when the Panama Papers and the Turkish Citizenship Database were leaked the same day, or when Google considered using Swift for Android for the first time, or as the last year in which people enjoyed this useless thing called privacy.

We will be three years away from the Year 2038 Problem and people will be really nervous about it.

Of course I do not know what will happen 19 years from now, but I can tell you three things that will happen for sure:

Somebody will ask a question in Stack Overflow about how to filter email addresses using regular expressions.
Somebody will release a new JavaScript framework.
Somebody will build something cool on top of LLVM.

And maybe you will remember this talk with a smile.

Thank you so much for your attention.

Blog George Gritsouk

Blog George Gritsouk

I’m a web developer in Toronto, Canada. I have a degree in Nanotechnology Engineering from the University of Waterloo.

Human Git Aliases
Stubbornly Refusing to Speak The Computer’s Language

The most common .gitconfig I see is blank except for setting a username. The second most common is this:

[alias]
ci = commit
cia = commit -a
cam = commit –amend
cama = commit –amend -a

cl = clean
cldf = clean -df

res = reset
resa = reset HEAD

# 82 more 4-character aliases

This config basically trades space in your head for keystrokes. Save on typing by remembering short command aliases. I don’t love that. I make typos, and sometimes I don’t get enough sleep, and generally this is just going to make life harder on me. I shouldn’t be bending to suit the computer’s language, the computer should learn mine. I don’t care so much about having short commands, I have a shell with autocomplete that works. Instead, I use real words and try to make the whole thing more human.

My goals with git aliases are:

smooth out git’s unwieldy UI
make a few common workflows faster

For example, in git, trying to just get a list of something in the repository is insanely inconsistent. I fix it like so:

branches = branch -a
tags = tag
stashes = stash list

How about common operations for undoing work? I never want to Google “how to unstage a file”, there should just be a %$&#ing command to unstage a file.

unstage = reset -q HEAD —
discard = checkout —
uncommit = reset –mixed HEAD~
amend = commit –amend

I even have a nuclear version:

nevermind = !git reset –hard HEAD && git clean -d -f

which unstages changes in the index, discards changes in the working directory, and removes any new files.

I also really like having

graph = log –graph -10 –branches –remotes –tags –format=format:’%Cgreen%h %Creset• %<(75,trunc)%s (%cN, %cr) %Cred%d' –date-order

to see real timeline of who is working on what and when. Another good example:

precommit = diff –cached –diff-algorithm=minimal -w

This is a key part of my workflow. I run this before every commit to make sure I don’t need to use the undo commands.

Bend the aliases to how you think and work, not the other way around. Let your aliases reflect your values, instead of just saving you keystrokes.

I got a few great suggestions from Reddit comments on this post:

unmerged = diff –name-only –diff-filter=U by kasbah and remotes = remote -v by WrongSubreddit are my favourites. Thank you!

A full list of my Git aliases is in my dotfiles repo.

bemycto.com: Building a blog with Jekyll in 5 points – Part 2: Templates

bemycto.com: Building a blog with Jekyll in 5 points – Part 2: Templates

This post is the continuation of a previous article: Building a blog with Jekyll in 5 points – Part 1: Presentation. Please be sure you read it first.
For the second part of this Building a blog with Jekyll series, we’ll take a look at the Liquid template system. It’s quite cool and powerful, I’m sure you will love it.

Basic overview
Liquid is a template engine written in Ruby and created by Shopify for theming purpose. As it’s a powerful library, Jekyll uses it as its template engine.

Liquid is based on three concepts:

Objects and variables that you can output, assign and modify
Filters that allow you to transform your objects and variables data before outputting
Tags that change template flow and do complex stuff
Jekyll provides a lot of objects, filters, and tags. Using them, you can customize your website with few efforts.

Output data
First thing first, we need to output our data. We’re assuming our page has a page.title property which is equal to My awesome Title.

If we want to display this in an h1 tag, we can add this in our template:

{{ page.title }}

Don’t worry about HTML escaping: Liquid does the job for us. If you want to display raw data, for example if you have HTML tags in your title and you’re absolutely sure that this data is safe. You can use this instead:

{{{ page.title }}}

These are the basics of Jekyll templating. Now, we’ll take a look at the filters.

Filters
Filters help you transform your data in a convenient way. For example, imagine we’re sticking to the Ruby On Rails paradigm Convention over configuration.

We decided that using a slug of the title we can find its associated picture using this pattern:

/img/posts/${article_slug}.jpg
We can simply use the slugify filter to make this possible:

{{ page.title}}
Simply using the | operator we can process our data with the given filter to transform them. That’s incredibly powerful, and there is a ton of filters available. Don’t worry about the site.baseurl, we’ll talk about this in the part 3.

If you want a complete list of available filters, take a look at the documentation.

Tags
Tags are a powerful way to change your template flow. Typical tags are if blocks, for loops.

Tags are called like this:

{% my_tag and_some_optional_params %}
One of the most useful to reuse content is the include tag, especially combined with the for tag. For example:

{% for post in site.posts %}
{% include post_preview.html current_post=post %}
{% endfor %}
The file named _includes/post_preview.html will be loaded, and can access the include.post value. For example to only display post title:

{{ include.post.title }}

If you want a complete list of available tags, you get it, the documentation is here fo this.

Customization
If you want to go further and create your own filters and tags, Jekyll allows it. We won’t cover this topic here, but you can find a lot of plugins that add new filters, tags and a lot of cool stuff.

For example in The WebTechie Review, I use two plugins:

One for generating the tag cloud
One that modifies categories permalinks to slugify them
Wrapping up together
Summing it all up, here is a quick sample of how we can display a posts list:

# Content of the index.html file

My site posts

{% for post in site.posts %}
{% include post_preview.html current_post=post %}
{% endfor %}
The empty metadata block (surrounded with 3 dashes) is mandatory for Jekyll to parse your file. Without it, it will include it as plain text without parsing.

# Content of the post_preview.html file

{{ include.current_post.title }}

{{ include.current_post.title }}
{{ include.current_post.excerpt }}
{{ include.current_post.content | number_of_words }} words
{{ include.current_post.date | date_format_to_string }}
Given the following post:

# Content of _posts/2015-05-23-my-post.md

title: My post
date: 2015-05-23

This first paragraph will be the excerpt property. Please don’t truncate it without stripping tag before.

This second paragraph will only be displayed in the content property.
Will generate the following HTML:

My site posts

My post

My post

This first paragraph will be the excerpt property. Please don’t truncate it without stripping tag before.

27 words
2015-05-23 00:00:00 +0200
Pretty powerful, isn’t it?

Next time…
Now it could be a little clearer… But I imagine you have still many questions:

What are those variables like site.posts and such?
How can I build a real website using Jekyll?
How can I deploy it in production?
These three points are going to be explained in the next parts. If you have a question, please leave a comment 🙂

Melbourne Institute Working Paper Series Working Paper No. 7/16: Use It Too Much and Lose It? The Effect of Working Hours on Cognitive Ability

Melbourne Institute Working Paper Series Working Paper No. 7/16: Use It Too Much and Lose It? The Effect of Working Hours on Cognitive Ability

Using data from Wave 12 of the Household, Income and Labour Dynamics in Australia
(HILDA) Survey, we examine the impact of working hours on the cognitive ability of people
living in Australia aged 40 years and older. Three measures of cognitive ability are
employed: the Backward Digit Span; the Symbol Digits Modalities; and a 25-item version of
the National Adult Reading Test. In order to capture the potential non-linear dependence of
cognitive ability on working hours, the model for cognitive ability includes working hours
and its square. We deal with the potential endogeneity of the decision of how many hours to
work by using the instrumental variable estimation technique. Our findings show that there is
a non-linearity in the effect of working hours on cognitive functioning. For working hours up
to around 25 hours a week, an increase in working hours has a positive impact on cognitive
functioning. However, when working hours exceed 25 hours per week, an increase in
working hours has a negative impact on cognition. Interestingly, there is no statistical
difference in the effects of working hours on cognitive functioning between men and women.

vitobotta.com: Migrating from WordPress to Jekyll – Part 2: Everything you need to know

vitobotta.com: Migrating from WordPress to Jekyll – Part 2: Everything you need to know

So, as promised, here’s the second part of a two parts series on why and how I migrated this blog to Jekyll from the publishing platform I was using previously, WordPress. Here’s the (technical) steps I had to take in order to complete the migration while preserving the site’s layout, usability and SEO characteristics.

Note: *I know that there are several other articles on the subject, however while I found many of them, I couldn’t easily find answers to questions such as where to find something, what data is available with Jekyll when building the site, and so on. So in this post, while I describe what I have done to migrate my blog, I have also tried to make those answers readily available for others who may have the same issues as I had when migrating.

It’s a pretty long post, with a lot of information of the WordPress => Jekyll migration, as well as on Jekyll in general, so I have organised the contents in various sections; if you are already familiar with Jekyll or are looking for a quick answer concerning something specific, feel free to jump to one section or another.

WordPress is a feature rich blogging platform that boosts a massive community and tons of extensions, so if you also have a blog powered by WordPress but are unsure as to why you may want to migrate, have a look at my previous post, in which I mentioned the reasons why I decided to migrate anyway, and that WordPress was not for me.

Your mileage may vary, of course, but if in the end you do want to migrate to Jekyll, herein you will find useful tips that will hopefully save you some time. On a side note, while the focus here is clearly on the WordPress => Jekyll migration, much of the information in this post can also be useful to users wanting to migrate from different blogging platforms, such as Blogger, Tumblr, Live Journal, and others.*

Update 22 Apr 2011: I’ve published a follow up on how to integrate a dynamic contact form, powered by Sinatra.

Introduction

Assuming you already are thinking of abandoning WordPress for some reason, and are also thinking that Jekyll may be a good alternative for you, let’s first have a brief look at how Jekyll differs from WordPress, before looking at how to actually migrate. This may help you make an informed choice after all.

To recap from the previous post, Jekyll:

  • is a simple, blog aware, static site generator. This in short means that it isn’t a CMS like WordPress, it doesn’t use a database, and all it does is basically generate a totally static site that can then be served directly by a web server like Nginx or Apache; if you have (hopefully) used a caching plugin with WordPress such as WP-SuperCache, then you already know that this means a faster site; deployment is also easier, in that all you’d have to do to deploy a Jekyll site is copy it to your production web server (although you may prefer using something like Rsync, Capistrano or Rake, as we’ll see later);
  • since the site is fully static and is preprocessed in advance (as opposed to WordPress with WP-SuperCache or similar where you still have dynamically built pages cached to static files), Jekyll does not need any server-side technologies (while for example WordPress needs PHP), therefore no application server is required either; however, we’ll see how “mixing” a fast static site generated by Jekyll with some server side technologies can still make sense in some cases;
  • publishing or updating an article or page basically means you need to rebuild the entire static site – at least using Jekyll the typical way – and this may sound bad. Rebuilding the site is almost immediate for this blog, for example, since it’s pretty new and doesn’t have many posts yet; however it may take a considerable amount of time to rebuild a site with many more posts and pages.

The three points above should carefully be taken into consideration when switching to Jekyll. For larger sites, it may be a case of better performance for the reader (Jekyll) vs faster publishing (WordPress). Depending on whether you are or not a Ruby developer and are used or not to tools such as Git or other SCM, you may either prefer WordPress’ administration interface or using a developer’s tools instead to publish your articles. This, again, is very important and you need to be sure you’ll be happy with Jekyll’s more “manual” approach. Personally, I do prefer editing my posts with TextMate, versioning with Git, and both publishing and maintaining the site with Rake and Capistrano tasks. It may sound like a more complex workflow, but in realty it can be simpler and a lot quicker once you are used to shortcuts of any type between your shell and your text editor of choice.

However, since I bet many among the people who will end up reading this page are either non developers or are PHP developers who are used to different tools, it’s your call whether all this suits your or not.

Installing and using Jekyll

For starters, in order to use Jekyll as your new publishing platform, you’ll need to install it. Assuming you have a Ruby environment already configured, all you need for now is the jekyll gem:

gem install jekyll  

Once you’ve got the gem installed, you’ll need a folder which will contain the few files and folders Jekyll requires to work; at a minimum, Jekyll expects the following folder structure:

-rw-r--r--   _config.yml
drwxr-xr-x   _includes  
drwxr-xr-x   _layouts  
drwxr-xr-x   _posts  

So just three folders and a configuration file. As we’ll see later, my blog’s folder has got a lot more stuff because of various extensions, plugins and more. But the items listed above are the basic files and folders Jekyll needs to work. You can also find a lot of Jekyll sites hosted on Github, so you could even fork one of them and start from there (at the moment I am using my own Git server for this blog, but I will also push the code to Github when I have time).

_includes contains text files, typically HTML, that can be included like “widgets” in pages or layouts; if you are familiar with Ruby on Rails development, they are like partials in the context of Jekyll. _layouts, similarly, contains the layouts which can be used to give a consistent look to your pages – as in WordPress, you can different layouts, for example with or without sidebars, or with different sets of includes (widgets in WordPress). Jekyll uses Liquid as templating language. If you have never used it, it’s basically a templating language with a designer friendly syntax, and can be extended if you know your way around Ruby – I’ll show some examples on how to extend Liquid later on.

_posts will contain text files written in Markdown (so with the .markdown extension) which represent -you guessed it- your actual posts. When you run the jekyll binary (installed together with the gem) to build the static site, Jekyll will process these text files through a Markdown filter (you can choose from a few filters available, as we’ll see later) to convert them to posts in HTML. In a similar way, it will process “raw” pages through Liquid to generate the static, HTML pages that will then be served to clients (we’ll see in greater detail later that not all the pages are processed with Liquid, depending on one particular condition). It is important to remember that Jekyll expects post files to have names in the format YYYY-MM-DD-title.markdown, since it uses the first part of a post’s filename to determine the publishing date for the post. Unfortunately, Jekyll as is doesn’t handle the time when a post was published, and this – as we’ll see later – can cause a few issues although it can easily be fixed.

_config.yml, the configuration file, is not really required, in that the settings it contains can also be specified at runtime as command line arguments for the jekyll command. I prefer keeping in _config.yml those options that I always want to have enabled, while I leave out options that I may want to use or not at runtime, depending on what I am doing; more details in the next section.

Provided you already have some posts and have some layouts ready – we’ll see later how to create both – it’s extremely easy to generate the static site with Jekyll. All you have to do is run the jekyll command from within your blog’s folder. Jekyll will then generate a static copy of your site, ready to be served to users with any web server, in a subfolder named _site. There are various deployment options which we’ll see later, but in theory to deploy a Jekyll-generated site it’s enough to copy the _site folder to the production web server, and that’s it!

Configuration

As you may have guessed by the file’s extension, the configuration is stored in YAML format, which you will certainly be familiar with if you are a Ruby developer. All the settings you can put in there are optional, since -as said- you could also specify the same options as command line arguments for the jekyll binary; the end result is the same (for example, if you wanted to run Jekyll and also start a server at the same time, you’d have to run jekyll –server).
Here’s my current configuration file:

auto: false  
server: false  
lsi: false  
markdown: rdiscount  
pygments: true  
permalink: /:title  
paginate: 10  
category_dir: topics  
category_title_prefix:  
exclude: ["lib", "blog.rb", "Capfile", "config", "log", "Rakefile", "tmp"]  
destination: public  
  • auto: if set to true, when you run the command jekyll it won’t just build the static site and then exit; instead, it will run in foreground and monitor for changes in the posts or the other source files, and will automatically update the static site whenever changes occur; it can be useful for quick testing and debugging while your are either writing posts or making changes to the layouts, for example;
  • server: if set to true, Jekyll will generate the static site and then start a server to serve the static site it has generated (the default is Webrick); it’s not what you should use in production because of scalability issues, however it can be pretty useful in development;
  • lsi: its value (true or false) determines how Jekyll will figure out related posts to associate to each of your posts. The default setting is false, meaning that Jekyll will do a sort of very quick calculation to determine related posts; this way building the static site is really fast, but the results are pretty poor. You can get more accurate related posts by setting this option to true; this way Jekyll will use Latent semantic indexing to find related posts, but it has the downside that the generation of the static site can take a ridiculously long time compared to the other option, so especially with large sites it may not be a good idea (well, depending on how important related posts are to you). I must say that this is one of the very few things I don’t like about Jekyll so far; I used to use a great plugin for this in WordPress, which yields much more accurate results -IMHO- and didn’t really slow down things in a noticeable way;
  • markdown: as anticipated, Jekyll allows you to use Markdown syntax when your write posts (although HTML is also allowed and correctly recognised in the files); this option lets you specify which Markdown filter Jekyll should use. I initially used Maruku since I like its extensions to the original Markdown syntax (it’s a Markdown “superset”); however, unsurprisingly, I had to give up on it because of performance issues. Maruku is a pure Ruby Markdown interpreter, and as such it is very slow if compared to other alternatives. The fastest interpreter these days seems to be still Rdiscount, which is for the most part written in C. Unfortunately using Rdiscount means that you’ll have to cope with a less featured Markdown syntax (and write some HTML by hand here and there), however the difference in terms of speed is massive, so big that even with a small site you’d be happy with the compromise (here‘s a benchmark done by somebody which will give you an idea of the difference); of course if you still prefer a richer Markdown superset, the choice is yours, however remember that the static site needs rebuilding each time its contents change, so speed is really an extremely important factor here. From my tests with Maruku, also, I’ve noticed that Maruku is a lot more sensible to syntax mistakes than Rdiscount is, in that it will very easily complain if the syntax is not 100% perfect, making editing a little annoying at times;
  • pygments: Jekyll supports syntax highlighting out of the box (while WordPress for example requires a plugin for this), though Pygments; the languages supported are pretty many and there’s a lot of themes (basically, CSS styles) to choose from;
  • permalink: in WordPress, you have likely used a custom permalink structure to improve the SEO performance of your blog; of course, you can do this with Jekyll as well with this setting. Many people -me included- agree that the /:title permalink structure is best for SEO, however you need to choose wisely: ideally you should use exactly the same permalink structure you used to use with WordPress, otherwise your blog’s search engine rankings will be severely affected; you can of course change it whenever you want, but you must make sure your site instructs 301-permanent redirects from the old URLs to the new ones, in order to let search engines know in the correct way about the changes. Since Jekyll normally does not use a database nor a server side technology, this would only be possible if you have access to the web server’s configuration for your site (which you don’t, if you are using a third party’s hosting service) and are familiar with that kind of configuration; another option is to “mix” the static site generated by Jekyll with some server side technology so to be able to instruct these redirects with server side code (we’ll see later that I am using Sinatra together with Jekyll, although for other reasons);
  • paginate: it lets you choose how many posts to show per page (Jekyll supports pagination);
  • category_dir: similar to the same option in WordPress to choose the root (in the URL) for your category pages; I have this setting because of an extension which I’ll explain later;
  • category_title_prefix: it allows you to prepend some text to page titles in category pages, such as “Posts filed in ..”; this setting is also there because of an extension;
  • exclude: as you may have noticed earlier, the folders and files used by Jekyll to generate the static site have names that start with the underscore (“_”); this is because Jekyll recognises files and folders that contain files required for the generation of the static files, and will not copy them to the _site subfolder. So if you have files and folders that are required by Jekyll to generate the site or that you just need to keep in the site’s root, but you don’t want them to be available for download by clients, you have two options: the first is to give them names that start with the underscore; the second, when the first one is not possible or if you don’t like that limit, is to set the exclude option to an array of names which will be then also ignored by Jekyll. We’ll see later why I have added those names in my configuration file;
  • destination: as mentioned earlier, by default Jekyll will generate the static site in the _site subfolder, but you can also customise its name. In my case, for example, I am using Capistrano for my deployment tasks, and since Capistrano expects the folders images, javascripts and stylesheets to be under the folder public, otherwise it will show warnings / throw errors, I have configured the destination option to let Jekyll know that I want the static site to be generated in the public subfolder instead.

There may be other options available, but so far this is all I have needed so I haven’t investigated further.

Importing posts from WordPress

Perhaps the most important step among the first ones during the migration, is to import the posts you already have in your WordPress blog, into Jekyll. Importing essentially means that you will need to read the relevant data from WordPress’ database and use it to dynamically create in the _posts folder as many text files as the number of posts you have. This, of course, unless you are willing to rewrite your posts in Jekyll…

It doesn’t really matter what you use to automate this process, as long as you can create the text files Jekyll expects from the data in your WordPress database. If you, like me, work with Ruby, then Ruby will of course be the natural choice. Before migrating my posts, I saw on Jekyll’s site that there was a reference script in Ruby, but it didn’t cover some things I wanted to do so I wrote my own script; you could use it as it is or as a base for your own customised import process, but you need to make sure you have the required gems installed before proceeding.

Here’s why I used a different import script:

  • I wanted to import more data, such as featured images for posts (introduced with WordPress 3.0 if I remember rightly), categories, and tags;
  • I wanted to import existing drafts and distinguish them from the published posts
  • I wanted to convert the posts to Markdown while migrating; as said earlier, you can leave HTML in the Markdown files, but I didn’t like the idea of having mixed HTML and Markdown posts in my _posts folder, plus I had to cleanup anyway the HTML generated by WordPress and change things like the syntax highlighting code.

You can find the full script in this gist on Github; here I’ll highlight a few things in the code if you are familiar with Ruby.

query = <<-EOS

  SELECT   post_title, post_name, post_date, post_content, post_excerpt, ID, guid, post_status, post_type, post_status, 
        (  SELECT  guid 
           FROM    #{table_prefix}_posts
           WHERE   ID = (  SELECT  meta_value 
                           FROM    #{table_prefix}_postmeta
                           WHERE   post_id = post.ID AND meta_key = "_thumbnail_id") ) AS post_image

  FROM #{table_prefix}_posts AS post
  WHERE  post_type = 'post'

EOS

categories_and_tags_query = <<-EOS

  SELECT         t.taxonomy, term.name, term.slug
  FROM             #{table_prefix}_term_relationships AS tr
  INNER JOIN   #{table_prefix}_term_taxonomy AS t ON t.term_taxonomy_id = tr.term_taxonomy_id
  INNER JOIN   #{table_prefix}_terms AS term ON term.term_id = t.term_id
  WHERE            tr.object_id = %d
  ORDER BY   tr.term_order

EOS  

As you can see, I have used two queries rather than a single query as seen in similar scripts for the importing. This is because as said I also wanted to import more data about posts, and I also wanted to import categories and tags as they were in WordPress; I could have perhaps written a single query anyway, but since I didn’t have many posts to import, it was easier to proceed this way rather than spending more time to figure out a proper join.

...
status     = post[:post_status]  
...
image      = File.basename(post[:post_image]) rescue ""  
...

You can see here that I am also reading the featured image (if any), plus the status. This is because I then want to separate drafts from published posts (the sample script on Jekyll’s site does not take drafts into account).

`wget -O "images/posts/featured/#{image}" "#{post[:post_image]}"` unless File::exists?("images/posts/featured/#{image}") || post[:post_image].nil?

This may look ugly… I could have certainly copied the featured images directly from the other site, but since I had the URLs already in the database, it was quicker to just download the files with the proper name to the expected target destination.

db[categories_and_tags_query % post[:ID]].each do |category_or_tag|  
  eval(category_or_tag[:taxonomy].pluralize) << { 
    "title"    => category_or_tag[:name], 
    "slug"     => category_or_tag[:slug],
    "autoslug" => category_or_tag[:name].downcase.gsub(" ", "-")
  }
end  

Here’s where I use the second query to import categories and tags (both referred to as “taxonomies” in WordPress) for each post. The autoslug is perhaps something temporary as I haven’t decided yet how to manage slugs and titles for categories and tags. In my WordPress blog, these taxonomies had human friendly titles which often differ from their related slugs. So for the time being I am importing everything as is so not to affect SEO, although this means that I am basically storing -in posts- three values for each category or tag (title, slug, and autoslug). The autoslug is basically something that I am using for now for posts published after the migration, and it basically converts the human friendly title into a valid slug for use in URLs; the slug instead, if present, will have priority since it means that it was the slug used in the WordPress blog.

data = {  
 'layout'        => 'post',
 'title'         => title.to_s,
 'excerpt'       => post[:post_excerpt].to_s,
 'image'         => image,
 'wordpress_id'  => post[:ID],
 'wordpress_url' => post[:guid],
 'categories'    => categories,
 'tags'          => post_tags
}.delete_if { |k,v| v.nil? || v == ''}.to_yaml

Here you can see that I am also keeping the excerpts, and assign categories and tags to each post. In some other import scripts I have seen, others choose not to import the excerpts and to dynamically generate a “read more” intro by cutting the first N words from the post’s content, but I do prefer having excerpts better thought for both usability and SEO.

File.open("#{status == 'publish' ? '_posts' : '_drafts'}/#{name}", "w") do |f|  
  f.puts data
  f.puts "---"
  f.puts content
end  

Finally, once collected and processed the relevant data, a text file for each post is created in either _posts or _drafts depending on whether it was a published post or still a draft. As mentioned previously, Jekyll expects these files to have names in the YYYY-MM-DD-post-title.markdown format; so it doesn’t understand the time when a post was published, but only the date.

Here’s a warning: if you leave this as it is (instead of implementing a fix or workaround to preserve the original date and time each post was published), beware that this will affect the Atom/RSS feed that Jekyll will generate for your site, in that posts won’t have a time associated to their published date, and therefore RSS readers and alike will treat all the old posts as new. So your feed’s subscribers will likely see all the old entries once again in their RSS clients. I realised this after I had completed the migration. If you have many posts, you may want to think about this as it may annoy your readers.

In the last code snippet, you can see that the YAML data generated from the relevant data for a post, is printed at the beginning of the file. This is called YAML Front Matter. Here’s the front matter for my previous post, as an example:

--- 
layout: "post"  
title: "Migrating from WordPress to Jekyll - Part 1: Why I gave up on WordPress"  
excerpt: "Wordpress is a fully featured CMS that makes a great choice for blogging. However, after just three months, I decided WordPress wasn't for me, and started to look for alternatives that would be easier for me to manage and customise as Ruby developer. Enter Jekyll. Here, in this first part, is why I don't think I will ever want to go back to a CMS like WordPress."  
meta-description: "Part one of a two-parts post on migrating my blog from WordPress to Jekyll: the reasons"  
image: "jekyll.png"  
tags:  
- slug: "wordpress"
  title: "Wordpress"
  autoslug: "wordpress"

(cut)  
---

I have cut the other tags and all the categories to save space, since the concept should be clear. There’s one thing I’d suggest you keep in mind when setting the properties: there are some characters you would need to escape when writing string values, or they will break the processing of the page. So it’s just easier to enclose all text values in the front matter within double quotes, so then you know you only need to escape double quotes.

So to import the existing posts from WordPress, you can run the script as follows (in my case I had the script in the lib subfolder):

ruby -r './lib/wordpress' -e 'WordPress::import("your wordpress db name", "db user", "db password", "table name prefix")'  

Provided you have all the dependencies installed, and can access the WordPress database with the credentials you have specified, after running the script you should have as many text files in the _posts and _drafts folders as the published posts and drafts you had in WordPress.

Converting the imported posts to Markdown

This is something that you may want to do or not depending on your needs. In my case, I didn’t like the idea of having new posts in Markdown, mixed to old posts – imported from WordPress – in HTML. Plus, I also had to cleanup anyway the HTML imported from WordPress and change things such as the syntax highlighting. So I decided I wanted to convert all the existing posts and drafts to Markdown as well, so to have all posts and drafts stored and managed in the same way.

Luckily, I found a Ruby library, DownmarkIt, ready to use. Unfortunately I still had to apply minor fixes to the Markdown it generated… but all in all it worked quite well; so if you also want to convert WordPress’ posts to Markdown files, grab the library first and put it in the same folder as the import script. Then you’ll have to make two small changes: first, you need to require the library; second, you’ll need to make sure that each imported’s post is first filtered through this library before being written to the destination text file:

%w(rubygems sequel fileutils yaml active_support/inflector).each{|g| require g}

require File.join(File.dirname(__FILE__), "downmark_it")  
...
db[query].each do |post|  
  ...
  content    = DownmarkIt.to_markdown post[:post_content]
  ...
end  
...

That should be it. In my case the Markdown resulting from the conversion wasn’t perfect (perhaps there are better alternatives, but I didn’t want to invest more time on this), so I had to manually fix some little mistakes which would result in errors when parsing these Markdown files. But nothing really time consuming for me since I only had around twenty posts to check.

Layouts and includes

Once I had imported drafts and posts from my WordPress blog, the next step naturally was to recreate the same layout and look in Jekyll. The very first attempt I made was basically to try to reuse the same stylesheets and scripts as in the old blog, and just copy and paste HTML from the blogs’ pages (looking at their source) into new Liquid layouts, extracting here and there portions of HTML into includes.

However there was a lot of clutter in the HTML produced by WordPress or, more likely, the plugins I had; plus, the plugins were also the source of some mess with JavaScript and CSS Stylesheets, although I had already done some optimisations for the WordPress blog. So in a word the operation didn’t feel “clean”, and I decided to restart from scratch with regards to layouts and includes, rewriting all the HTML.

As already mentioned earlier, Jekyll uses Liquid to process layouts and includes, and luckily Liquid makes it easy and quick to recreate layouts, nested layouts, and includes in a way that looks similar to how partials work in Rails, so I really like this kind of organisation of the markup in several files.

Jekyll will always parse with Liquid any file in the folders _posts, _includes and _layouts, by default. However Jekyll can also parse through Liquid any other file – regardless of its location – as long as the file contains, at the top, a YAML front matter; so if you want Jekyll to process a file with Liquid before the file gets copied to the static site (and that file is not a layout or include), you need to add a YAML front matter to that file. You don’t have to add actual data to that YAML section; even an “empty” front matter will cause Jekyll to parse that file with Liquid anyway. Example:

---
---

Here's the content of the page....  

The interesting thing here is that any file with a front matter will be parsed with Liquid, and this is useful because it means that even CSS stylesheets and JavaScript files can be so processed, for example if you want to dynamically inject content or change something depending on the value of some variable, or things like these.

Another cool thing is that with Liquid it is extremely easy to include some widget or anyway a portion of HTML into a layout, with the include directive:

<!DOCTYPE html>  
<html lang="en">  
  {% include head.html %}
    <body>
      ...
    </body>
</html>  

head.html will be expected to exist in _includes. Of course, you can include the same file multiple times in the same or different layout or page, or even in another include for nesting.

You can even have nested layouts with the same ease, and this is something I really like because it allows you to create a default layout, and then create variations with only the changes (for example a main layout with sidebars, and a layout without sidebars). Take the following layout, for example:

---
layout: default  
---
This text will be nested into the parent layout named "default".  

For the nested layout to appear in the parent layout, you need to use the content directive in the parent layout, as shown below:

<!DOCTYPE html>  
<html lang="en">  
    <body>

      This is the parent layout. 

      Nested layout's content will appear below:
      {{ content }}
    </body>
</html>  

The same applies to any page using a layout: you just need to specify a layout in the page’s front matter, and make sure the layout has the content directive. I like Liquid because makes complex layouts extremely simple, since you can have just as many includes or nested layouts as you want.

Using static and dynamic data

We’ve already seen that we can add basically any kind of “static” data to a page or layout, by adding it to the YAML front matter. So for example say that we have a post file with this content:

---
layout: post  
title: "My gorgeous post"  
---

Post content here....  

We could have a layout named post (which has been specified in the post) like this:

<!DOCTYPE html>  
<html lang="en">  
    <body>
      <h1>{{ title }}</h1>
      <p>{{ content }}</p>
    </body>
</html>  

So you can use the {{ }} syntax for any of the static data you specify to the YAML front matter, too. This means that each post will have its title (as well as other data), that will be then rendered in the relevant layout.

There’s a lot more you can do with Liquid (and as we’ll see you can even extend it); for more details, the official wiki is the best place to start with. I’ll show more examples along the way.

The “static” data you specify in the YAML front matter is not the only data you can use in your files. When you run the jekyll command to build the static site, Jekyll will automatically generate some hierarchical data on the posts, categories and so on that you can then use in any of the files that will be parsed with Liquid. Remember: you still need to add at least an empty YAML front matter to a file your want to be processed with Liquid, unless that file is in any of these folders: _posts, _includes, _layouts.

For example, in your site’s index page you will want to show the latest posts, right? So you can create an index.html file in the Jekyll folder with containing something like this:

---
layout: default  
---
<h1>Latest entries</h1>  
{% for post in site.posts limit:4 %}
<h2>  
    <a href="/" rel="bookmark" title="Permanent link to ">{{ post.title }}</a>
</h2>  
<span>{{ post.date | date: '%B' }} {{ post.date | date: '%e' }}, {{ post.date | date: '%Y' }}</span>  
<p>  
    {{ post.excerpt }}
</p>  
{% endfor %}

As you can see, in Liquid you can use loops as well; in the snippet above, you can also spot an example of Liquid functions to manipulate (in this case) dates. This is just a tiny example of course, but we’ll see more of these (again, see Liquid’s wiki for more info).

The most important “dynamic” data Jekyll makes available when generating a site, that you can use in your files is as follows:

  • site.categories, site.tags: returns all the categories / tags detected when processing the posts
  • site.categories[page.category]: returns all the posts for the category of the current page; this can be useful when creating category pages, since it can be used to list all the posts for a particular category
  • site.tags[page.tag], same as above, for tags
  • post.categories, post.tags
    page.url, page.title, page.previous, page.previous.url, page.previous.title, page.next, page.next.url, page.next.title
    site.related_posts

Once you know how to use both static and dynamic data, know which data is available to you, and know that you can also use loops and more in Liquid, you should now be able to build your own layouts, includes, etc. You may also want to search Github for same public Jekyll repositories and start your own blog from the source code of one of these, although I am afraid all the public examples I have seen are very basic.

When I have a little time, I will publish the source code for this blog as well since it’s a bit more complex than others so it covers more of what could or should be done with Jekyll.

Liquid: some more useful stuff

You already know about place holders to render data in layouts and pages, and have already seen loops. One thing that you may also happen to need is how to limit the items you want to iterate in a loop, or how to specify how many items you want to skip before iterating the remaining items.

Have a look at my index page: as you can see I have 4 featured posts with a particular layout, followed by 6 “teasers”, styled differently. To achieve this, my index.html page only contains this:

{% include featured-posts.html %}
{% assign teasers = site.posts %}
{% assign teasers_skip = 4 %}
{% assign teasers_take = 6 %}
{% include post-teasers.html %}

You’ve already seen include, so you know this means that the featured-posts.html file will be included in that place. But then there’s something new: Liquid also lets you set variables that you can then use in your layouts and pages. This is particularly useful, because it allows you to reuse the same portion of Liquid layout but make it render differently depending on the values of some variables.

So in my home page I first render the featured posts (we’ll see in a minute what’s in that include); then I assign all the site’s posts to the variable named teasers, and in the following two steps I specify in two other variables that I want to skip the first 4 posts (since they will already be rendered as “featured posts”), and that I want to only take the next 6. Finally, the teasers.html include is injected.

I use the same include in other layouts as well, but without limiting the number of posts to take or the number of posts to skip, so it’s nice that I can just set variables and then use the same include in multiple places.

Here’s what I have in the featured-posts.html include (I have omitted the HTML for each posts since you already know how to render a post’s data from the previous section):

{% for post in site.posts limit:4 %}
...
(featured post's content here)
...
{% endfor %}

You can see that I can limit the loop to the first 4 posts. And here’s the content of the teasers.html include:

{% for post in teasers offset:teasers_skip limit:teasers_take %}
...
(teaser's content here)
...
{% if forloop.last %}
{% else %}
    {% cycle '', '
' %} {% endif %} {% endfor %}

Where I say both how many posts to skip, and how many to take instead.

In the snippet above you can also see something that may look weird. My home page renders teasers in two columns, so to achieve this I basically inject alternately, after each teaser, and empty string (so, nothing basically) or a div with the following CSS styles:

.teaser-separator {
    clear: both;
    border-bottom: 1px solid #CCCCCC;
}

By adding that div with these styles, I force every odd teaser on the left, and every even teaser on the right. Luckily Liquid supports both the typical if..then..else construct, as well as the forloop.last and cycle methods, used the way shown above.

Another thing useful to know is how to render links to the previous and next posts (if any). This is really good for SEO since it helps with internal linking. Below is the markup I am using at the moment, which should be pretty self explanatory.

{% if page.previous.url %}

« {{ page.previous.title | truncatewords: 15 }}

{% endif %} {% if page.next.url %}

 »

{% endif %}

Next, here’s is how I render related posts on my post pages (again, it should be pretty clear how it works, so I’ll just paste it for reference):

Important: if you use the –lsi option to get more accurate related posts (at the cost of a slower generation of the static site), you’ll see that Jekyll suggests

Notice: for 10x faster LSI support, please install http://rb-gsl.rubyforge.org/

So you could speed things up by just installing the gsl gem. However if you just try to install the gem, you may see errors since the gem requires a gsl package as well.

On Mac OS X

This is what I got at first on my Mac, which I use for the development:

551 $ gem install gsl  
Fetching: narray-0.5.9.9.gem (100%)  
Building native extensions.  This could take a while...  
Fetching: gsl-1.14.7.gem (100%)  
Building native extensions.  This could take a while...  
ERROR:  Error installing gsl:  
    ERROR: Failed to build gem native extension.

        /Users/vito/.rvm/rubies/ree-1.8.7-2011.03/bin/ruby extconf.rb
extconf.rb:65: command not found: gsl-config --version  
*** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of  
necessary libraries and/or headers.  Check the mkmf.log file for more  
details.  You may need configuration options.

Provided configuration options:  
    --with-opt-dir
    --without-opt-dir
    --with-opt-include
    --without-opt-include=${opt-dir}/include
    --with-opt-lib
    --without-opt-lib=${opt-dir}/lib
    --with-make-prog
    --without-make-prog
    --srcdir=.
    --curdir
    --ruby=/Users/vito/.rvm/rubies/ree-1.8.7-2011.03/bin/ruby
extconf.rb:237: Check GSL>=0.9.4 is installed, and the command "gsl-config" is in search path. (RuntimeError)  
checking gsl version... 

Gem files will remain installed in /Users/vito/.rvm/gems/ree-1.8.7-2011.03@global/gems/gsl-1.14.7 for inspection.  
Results logged to /Users/vito/.rvm/gems/ree-1.8.7-2011.03@global/gems/gsl-1.14.7/ext/gem_make.out  

It turned out that I needed to install the gsl package first. It’s easy if you use for example Homebrew:

552 $ brew install gsl  
==> Downloading ftp://ftp.gnu.org/gnu/gsl/gsl-1.14.tar.gz
######################################################################## 100.0%
==> ./configure --prefix=/usr/local/Cellar/gsl/1.14 --disable-dependency-tracking
==> make
==> make install
/usr/local/Cellar/gsl/1.14: 243 files, 8.6M, built in 4.1 minutes

Then I could install the gem without any errors:

553 $ gem install gsl  
Building native extensions.  This could take a while...  
Successfully installed gsl-1.14.7  
1 gem installed  
Installing ri documentation for gsl-1.14.7...  
Installing RDoc documentation for gsl-1.14.7...  

Linux

I got a similar error when I did the first setup for my new Jekyll blog on my production server, and the fix was also similar; here’s what I had to install on Ubuntu 10.04 server:

aptitude install libocamlgsl-ocaml-dev libgsl-ruby1.8  

Once you got these dependencies sorted out, Jekyll should rebuild the site noticeably faster with the –lsi than before (it does for me).

Lastly, I wanted to mention a few other Liquid directives I find useful to chain or transform data for rendering:

{% capture post_url %}{{ page.url | prepend:'http://vitobotta.com' | append:'/' }}{% endcapture %}
{% capture encoded_post_url %}{{ post_url | urlencode }}{% endcapture %}
{% assign post_title = page.title  %}
{% capture encoded_post_title %}{{ post_title | urlencode }}{% endcapture %}
{% capture encoded_post_image_url %}{{ page.image | prepend:'http://vitobotta.com/images/posts/featured/' | urlencode }}{% endcapture %}

As you can see, it’s pretty easy to chain data in Liquid with capture: all what is between capture‘s opening and closing tags is concatenated and assigned to the variable specified in the opening tag. You can also see here some more examples of Liquid functions to prepend, append, and replace text. Then there’s the custom function urlencode; I’ll show later how I extended Liquid with this function.

All Liquid functions, as you can see, are used by appending a pipe to the data to transform, followed by the function’s name and any arguments it may require.

Comments

One thing that should be pretty clear by now, is that because Jekyll only generates a static site, and it doesn’t use a database nor any dynamic content built with server side technologies, it doesn’t support comments out of the box. There are two possibilities to fix that: either you can use some server side technology and integrate the management of comments into Jekyll’s static pages in a way or another, or – the recommended one – you can just outsource comments.

Services like Disqus, Intense Debate and similar have become pretty popular these days and are great for comments since they offer a much better experience even compared to that you’d normally have with WordPress; even with WordPress you could improve the built in comments feature with plugins, but that often means a heavier WordPress install, and generally speaking it’s hard to achieve this way the same user experience that those third party services offer with minimum effort. Even Facebook have recently revamped their comments plugin and it’s now pretty good, so much so that even some large sites have started to use it. However I prefer Disqus over it (I have never liked Intense Debate too much), because of both its nice integration of what they call social media reactions, and the fact that they allow users to login with multiple authentication providers, as opposed to Facebook that at the moment – I think – only supports authentication with Facebook credentials.

Regardless of which one you choose, there’s a few things you should know if you aren’t yet familiar with outsourcing a blog’s comments to a separate service:

  • since comments are -obviously- stored elsewhere, they will be loaded into your pages through a JavaScript remote call; it is very easy to integrate a third party commenting service since it only requires you to add some JavaScript code snippet to your layouts, but this also means that comments won’t be already present in the page while the page loads and is rendered, so comments may appear more slowly or not at all in the event the service you use is down;
  • the fact that comments are not present in the HTML documents of your pages, means that those comments basically “don’t exist” for search engines. Search engines, in fact, will only index the HTML of your pages, not JavaScript code nor JSON data that is loaded from the remote service to render your comments in the client’s browser; so this may be a negative point with regards to SEO depending on your point of view; some people think that comments may distract from the focus on the topic of a page (in the context of SEO), so perhaps it is better that they are excluded or ignored when search engines’ spiders index the site; other people instead think that the content of comments may also help drive search traffic to the site; if you think that comments add to the SEO of your site, it’s good to know that some services -like Disqus- create a comments page on their site for each page of your site that implements comments, and their pages basically instruct search engines of the canonical location of the comments, that is your own pages. So in theory this should fix some of the SEO related issues, and your site should be able to claim back some precious link juice;
  • while all these services usually require you to include just one script into your layouts, when comments are actually loaded these service make a number pretty high of additional HTTP requests to load whatever is needed to render the comments in your pages; this can indeed affect the general client side performance of your site so you should also take it into consideration because a) users may perceive slowness, b) search engines now also take into account client side speed in the algorithms that calculate rankings;
  • most of these external commenting services, when used with WordPress through plugins, also take care of the synchronisation of all the comments between your WordPress install and their own data store; this means that a) you will always have a local “backup” of all your comments, although they are managed by an external service; b) that if the commenting service is down at some time, you can temporarily switch it off and restore WordPress’ built in commenting until the problem is fixed so your readers would have a limited user experience but the commenting feature wouldn’t be completely lost. This is something that of course is not possible with Jekyll out of the box since Jekyll doesn’t have even a database. However nothing forbids you to do a local backup of all the comments every so often; most of these services (if not all), in fact, also offer APIs that you can use to download and synchronise comments with a local database, although this of course requires that you are familiar with some programming language. I am planning to do something like this in Ruby ASAP (unless there’s something ready for the purpose – I haven’t investigated this yet), and will then post about it when done;
  • if you have always used WordPress’ built in commenting feature, you will have all your comments in your WordPress database, therefore you will need to import all those comments into Disqus or other service before the same comments can be available for your Jekyll site. In my case I didn’t need to do this as I have always used Disqus since the beginning (so Disqus already had all my comments when I migrated), but if you need to, the easiest way is to install Disqus’ plugin in your WordPress blog and let it synchronise all the comments to Disqus’ servers; this may take anywhere from a few minutes to hours or days depending on how many comments you blog has.

So, how do you integrate one of these services in your Jekyll blog? I’ll show here how easy it is with Disqus, so far my favourite. You can get the necessary info from your Disqus account, however here’s what I had to do as an example and for quicker reference.

First, I created the file includes/comments.html, making sure it contains a div element with id set to disqusthread, since Disqus will render the comments in this element; then I specified the id of my blog in Disqus (which is vitosjournal) and the URL to use as the association between some comments and the page where those comments should appear. You can see I used a simple Liquid placeholder with the page.url that Jekyll makes available for each of your pages. Finally, I have included Disqus’ script:

Have your say!

Please see my comment policy if this is your first time here or if you have any questions regarding comments.

var disqus_shortname = 'vitosjournal'; var disqus_url = "http://vitobotta.com/how-to-migrate-from-wordpress-to-jekyll/"; /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); <noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript> <a href="http://disqus.com" class="dsq-brlink">blog comments powered by <span class="logo-disqus">Disqus</span></a> </div>

You can obtain a snippet similar to the one above from within your Disqus account (following Install => Universal Code). That’s it: you should now see the comments rendered correctly in your pages.

A second step was to get the View comments link next to each featured post or teaser (in the index or in the archive pages), to display the number of comments (and, for Disqus, of social media reactions) stored for each of them. You can achieve this by making sure that

a) you add another script which will take care of this:

(function () {
    var s = document.createElement('script'); s.async = true;
    s.type = 'text/javascript';
    s.src = 'http://' + disqus_shortname + '.disqus.com/count.js';
    (document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
}());

b) each link to comments has a URL ending with the #disqus_thread fragment identifier; Jekyll will recognise these links and, by default, use the URL in the link’s href attribute as unique identifier for the comments count to display:

...
<span class="comments-link"><a href="http://vitobotta.com/#disqus_thread" rel="nofollow">View Comments</a></span>  
...

You can optionally add an attribute named data-disqus-identifier if you want to specify a different URL or use some different piece of information as unique identifier for your comments. A word of warning: make sure that if you specify the data-disqus-identifier, the identifier is the correct one and Disqus correctly recognises it. In my case this didn’t work despite I had just taken the same identifiers I could see from my WordPress blog (I was using Disqus’ plugin already); however I haven’t had any issues whatsoever using the normal URLs instead.

That’s all for comments if you go for Disqus, but the process should be pretty similar with the other services too.

Search engine optimisation – Creating an XML site map

Worpress is not bad from a SEO point of view already out of the box, and you -like me- may have further improved its SEO performance with a SEO-oriented theme (mine was Thesis) and various plugins. With Jekyll, you need to care yourself of anything SEO, as well.

At least for the basics, here’s what I did: I first created an include, meta-tags.html, containing (among other things) these meta tags:

...
<meta name="robots" content="{{ page.meta-robots }}" />  
<meta name="description" content="{{ page.meta-description }}" />  
...

Then, in each page’s or post’s YAML front matter, I added the relevant information. Here’s, for example, the full content of my index.hml:

---
layout: "default"  
title: "Tips and walkthroughs on web technologies and digital life – Vito’s journal"  
meta-description: "Vito Botta’s tech blog with tips and walkthroughs on web technologies and digital life. Main topics covered are web development with free and open source tools (such as Linux, Ruby, Rails, MySQL), performance & scalability, computer security, database, search engine optimisation (SEO). Vito is a passionate web developer living and working in London, UK. His roles as analyst, developer and technology enthusiast overlap here on his web log."  
meta-robots: "noodp, noydir"  
---
{% include featured-posts.html %}
{% assign teasers = site.posts %}
{% assign teasers_skip = 4 %}
{% assign teasers_take = 6 %}
{% include post-teasers.html %}

  

As you can see, besides the title (recommended max length: 65-70 characters) and the meta description (recommended max length: ~150 characters), I also specify for each page how search engine’s robots should treat it. The possible values of the meta-robots setting:

  • noodp: instructs search engines to not generate descriptive snippets from the Open Directory Project as a source – as this may yield unexpected results;
  • noydir: same as above, but with regards to Yahoo! Directory;
  • noindex: tells spiders to basically ignore the current page;
  • noarchive: tells spiders not to cache/archive the contents of the current page;

I suggest you search these settings if you aren’t familiar with them; I basically have noodp and noydir for all my pages, while on pages that I want to be ignored by search engines, I also add noindex and noarchive. This way, I can make sure search engines only have relevant information on my posts rather than duplicate content or information that would only look like clutter in the context of a search.

Besides meta-tags.html, I also have another include, link-rel.html, which contains links to stylesheets but also a link tag for canonical URLs:

...
<link rel="canonical" href="{{ page.url | canonical }}" />  
...

As you can see, I am using the custom Liquid function canonical (we’ll see later how you can define custom functions), to adapt the page URL in some way. Please ignore that function for now since it’s something a bit more specific to my blog, the important thing is that you put in there the correct canonical URL for each page (in the YAML front matter, of course, so that include can be added to all the layouts with just a Liquid placeholder).

Canonical URLs are required to avoid issues with duplicate content in a number of cases, since duplicate content may affect the SEO performance of a site. In my WordPress blog, canonical URLs were very important since I often had very long posts (such as the one you are reading) that -because of the length- I had split in several sections, that is on different pages basically. In that case, by having a canonical URL pointing to the first page of a long article, for all its pages, would ensure that search engines would push ranking for the first page rather than diluting the SEO value of the article across its pages.

In reality, even with this kind of configuration there may still be problems with search engines. In fact, in Google’s Matt Cutts’ words:

A canonical page is the preferred version of a set of pages with highly similar content.

Also:

Must the content on a set of pages be similar to the content on the canonical version?
Yes. The rel=”canonical” attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay).

So Google makes it pretty clear that canonical URLs help avoid issues with duplicate content that may penalise the ranking of your pages, but the content on each of the pages pointing to the same canonical URL must have identical (or almost identical) content. This isn’t the case of course with the content of a post split into different pages, so I had to change my plans with regards to in-post pagination and I would suggest you do the same if you also come from a similar WordPress setup.

Besides SEO, I have received enough feedback from readers to understand they don’t usually like to have to load different pages just to read the entire content of an article. It’s a slower reading experience for them and they may think you do this just to increase the number of page views, although you may instead do this simply to organise longer posts.

So, it looks like I had enough reasons to drop in-post pagination altogether. Now, long posts on this blog (such as the one you are reading) are still organised in sections, but it’s actually just a single, long page with in-page hyperlinks. User experience should now be improved, and the site should be in safer water also for what concerns SEO.

Back to the canonical URLs: I am still using them, despite I no longer split long posts into multiple pages, ad again I would suggest you do the same. Why? It is just safer in case you happen to have the same identical (or very similar) content on more than one page, but not only. Do you -like me- use a CDN? If yes, you may be surprised to hear that depending on the CDN you use, and on how you use it, the CDN may affect the SEO performance of your site! Try a site:your-cdn-hostname.com search in Google, and check for yourself if you have CDN-cached copies of your pages and posts that directly compete (as for SEO) with your site! If yes, you clearly have a duplicate content issue there.

Make sure that your CDN allows you to specify a custom robots.txt for your CDN distribution that will remain as you define it regardless of the content of your site’s own robots.txt. So for example in the root of my CDN distribution I have a robots.txt file with the following content:

User-agent: *  
Disallow: /  

This tells spider to completely ignore the content of the whole CDN distribution (or “pull zone” if you are using a service similar to the one I use, MaxCDN). Instead, the robots.txt in the actual site that I do want spiders to index, contains:

User-agent: *  
Disallow:  
Sitemap: http://vitobotta.com/sitemap.xml  

This allows spiders to index the whole site without restrictions instead. You can also see that I am specifying the location of my site’s XML sitemap. In WordPress, you have likely -hopefully- used a plugin to create this site map that you could then let Google know about (optionally also through the Google Webmaster Tools).

It is a recommended task, since the XML site map helps Google understand the content you care about, and you can even tell it (as well as the other search engines I suppose) the priority with which it should index your pages and posts. You can easily have a fully functional XML site map with Jekyll too, by simply creating a sitemap.xml file in the root of your Jekyll folder. Here’s the content of mine:

---
---
<?xml version="1.0" encoding="UTF-8"?>  
<?xml-stylesheet type="text/xsl" href="/sitemap.xsl"?>  
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">  
  <url>
    <loc>http://vitobotta.com/</loc>
    <lastmod>{{ site.time | date_to_xmlschema }}</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>http://vitobotta.com/cv-resume/</loc>
    <lastmod>2011-02-07T20:59:36+00:00</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.6</priority>
  </url>
  {% for post in site.posts limit:20 %}
  <url>
    <loc>http://vitobotta.com{{ post.url }}/</loc>
    <lastmod>{{ post.date | date_to_xmlschema }}</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
  {% endfor %}
</urlset>  

You can see that I have specified only my Ruby CV page (you may want search engines to index other pages as well), and all the posts, assigning a higher priority to posts and asking search engines to come back to check for changes every day (this is only an indication of the frequency, you can’t really decide). You can also see that I have added an empty YAML front matter to ensure the file is processed with Liquid when Jekyll rebuilds the static site.

Sometimes I check that the site map is as expected after making changes to the site, by loading it directly in my browser. Since the file will be plain XML in the end, I have specified an XSL stylesheet so to make it appear it more user friendly when displayed in a browser. You can find my sitemap.xsl file in this gist if you want to do the same.

Syntax Highlighting

One of the nice things that Jekyll can handle out of the box (while in WordPress you’ll need a plugin), is syntax highlighting. As mentioned in the Configuration section, Jekyll uses Pygments for this; all you need to do is to turn the feature on, by either setting pygments to true in your _config.yml or as command line argument when you run jekyll to rebuild the static site.

It’s very easy to let Jekyll know you want to render a portion of text in your posts, as code:

...
some Markdown or HTML...  
...
{% highlight html %}
YOUR CODE HERE  
{% endhighlight %}
...

Besides HTML, Pygments supports a long list of languages.

One optional thing that you should do is generate the CSS stylesheet for the theme you choose to use with Pygments. In my case, I didn’t like too much the (many) themes available by default, and I went instead for a Railscasts inspired theme which I like much more.

First, I had to install the new theme so that Pygments would recognise it:

git clone git://github.com/DrMegahertz/pygments-style-railscasts.git

cd /Users/vito/Documents/Code/OpenSource/pygments-style-railscasts

./setup.py install

Then, to check that all went well, I started Python’s console and entered the commands you see below:

python

>>> from pygments.formatters import HtmlFormatter
>>> HtmlFormatter(style='railscasts').style
<class 'pygments_style_railscasts.RailscastsStyle'>  

If you see that class returned, all is well. You can now export the correct styles to a new CSS stylesheet that you’ll then have to include in your layouts:

pygmentize -S railscasts -f html > railscasts.css  

Atom/RSS feed

Similarly to what we’ve seen for the XML sitemap, when you switch from WordPress to Jekyll you’ll also have to remember that the Atom / RSS feed for your blog won’t be automatically generated and updated for you. However, it’s as easy to have a feed with Jekyll as it is to have an XML site map.

Just create a file with the same name and location of the feed in your WordPress blog (usually /feed/), and paste the code below into it:

---
---
<?xml version="1.0" encoding="utf-8"?>  
<feed xmlns="http://www.w3.org/2005/Atom">  
    <title type="text" xml:lang="en">Vito's journal</title>
    <link type="application/atom+xml" href="http://vitobotta.com/feed/" rel="self"/>
    <link type="text" href="http://vitobotta.com" rel="alternate"/>
    <updated>{{ site.time | date_to_xmlschema }}</updated>
    <id>http://vitobotta.com</id>
    <author>
        <name>Vito Botta</name>
    </author>
    <rights>Copyright (c) 2010-2011 Vito Botta</rights>
    {% for post in site.posts limit:20 %}
    <entry>
        <title></title>
        <link href="http://vitobotta.com{{ post.url }}/"/>
        <updated>{{ post.date | date_to_xmlschema }}</updated>
        <id>http://vitobotta.com{{ post.url }}/</id>
        <summary type="html">{{ post.excerpt | xml_escape }}</summary>
    </entry>
    {% endfor %}
</feed>  

That’s it. Jekyll will keep this file updated whenever you publish new posts and clients will know about them.

Remember, however, that Jekyll doesn’t handle the time when posts are published, and this may cause your readers to see old posts coming back as soon as they hit your new feed for the first time (as mentioned earlier). In my case I realised this after I had already migrated, so it was late.

If you want, it’s not difficult to fix: just add the time to the YAML front matter of each post, and then add a Liquid place holder for it in the feed template you see above. This should work, although I haven’t tested it.

Extending Jekyll: plugins and extensions

One of the most interesting characteristics of Jekyll is that – provided you know your way around Ruby – you have full power on what it can do in three ways. First, the most obvious one: you can just fork the code and make whichever changes you like, and then use your version of the gem rather than the original one.

However, while this is easy, I prefer the other two options, plugins and extensions, since they let me achieve the same without having to change Jekyll’s code. There are two main advantages with this:

  • you can update Jekyll’s gem whenever there’s a new version, without having to mess with the code because of the changes you had made on the previous version;
  • both plugins and extensions live in code files that you can easily share across projects.
Plugins

Jekyll expects your plugins to live in the _plugins folder. So far I have needed plugins for two particular needs:

  • to make Jekyll generate custom page types that it wouldn’t generate otherwise; for example, think of the archive pages (category pages, tag pages, monthly archives and so on), since Jekyll -surprisingly IMHO- doesn’t generate them by default despite it understands categories, tags, etc;
  • to register new, custom Liquid tags that can be used to pull in whatever content you want, in any place of your layouts, with just a single placeholder as we have seen for content, for example.
Archive pages

I was lucky enough to find some useful examples of plugins here; I basically took the plugin to generate category pages and customised it slightly since -as you may remember- each category in my site has multiple properties (slug, title and autoslug) vs a single text value. Then I created other plugins from that one to generate tag pages and monthly archives.

Besides these, I have also created a couple of other plugins:

Tag cloud

The first custom plugin I wrote lets me register the custom Liquid tag tag_cloud. If used as we’ve seen for other placeholders,

...
{{ tag_cloud }}
...

it will render … you guessed it.. a tag cloud. Here’s the code of the plugin:

module Jekyll  
  class TagCloud < Liquid::Tag
    safe = true

    def render(context)
      tags = context.registers[:site].tags.map{|tag| 
        { 
          "title"    => tag[0]["title"], 
          "slug"     => tag[0]["slug"], 
          "autoslug" => tag[0]["autoslug"], 
          "posts"    => tag[1] 
        } 
      }

      min_count = tags.min{|a, b| a["posts"].length <=> b["posts"].length }["posts"].length
      max_count = tags.max{|a, b| a["posts"].length <=> b["posts"].length }["posts"].length

      weights = tags.inject({}){|result, tag| result[tag["title"]] = ( ((tag["posts"].length - min_count) * (280 - 75)) / (max_count - min_count) ) + 75; result }

      tags.inject("") { |html, tag|
        html << "<span style='font-size: #{sprintf("%d", weights[tag['title']])}%'><a href='/tags/#{tag['slug'] || tag['autoslug']}/'>#{tag["title"]}</a></span>\n"
        html
      }
    end
  end
end

Liquid::Template.register_tag('tag_cloud', Jekyll::TagCloud)  

As you can see it’s different from the other plugins in that it only registers the Liquid tag but it doesn’t create any files. If you’re curious how I calculate the “weight” for each tag depending on the number of posts associated to it, I am now using a formula I found here after some googling.

Speeding up syntax highlighting

Another very useful plugin I was glad to have found, helps speed up the syntax highlighting. I found it here. It basically caches code that has already been processed with Pygments to files in the _cache folder, so next time Jekyll has to rebuild the static site it won’t have to reprocess the syntax highlighting for code snippets that are already cached and haven’t changed (the plugin checks the MD5 hash of each code snippet).

Since I don’t have many posts yet, perhaps it doesn’t make much difference for me, but since Pygments can be quite slow -I think- with large code snippets, I can see how it may help speed things up on larger blogs with lots of code snippets.

Extensions

Plugins aren’t the only way to extend Jekyll without touching its code. I found here about a Ruby gem named jekyll_ext that lets you achieve similar results but in a slightly different way. Extensions are expected to live in _extensions, and work by changing or adding features through filters, thanks to meta programming – see this for more details.

At the moment I am using a single extension, and over the past few days I have realised that plugins are kind of the “official” way of extending Jekyll, so perhaps it may be pointless to use both plugins and extensions; however at the moment I am happy with my setup so I haven’t experimented for example with converting the single extension I currently have into a plugin.

There is one reason though I may anyway look into using just plugins in the future, in that to make extensions work you need to generate the static site by running the wrapper ejekyll (installed together with the jekyll_ext gem), rather than jekyll. It’s just one more thing to remember perhaps, but this makes plugins a little cleaner as a solution for extending Jekyll, besides that they are baked into it already.

Here’s the code for that extension:

require "active_support"  
require "cgi"  
require File.join(File.dirname(__FILE__), "../config/environment")

module Jekyll  
  module Filters
    def urlencode(input)
      CGI::escape(input)
    end
    def canonical(input)
      "http://vitobotta.com/#{ input.sub(/\/index\.html$/, "") }/".gsub(/\/\//, "/")
    end
  end

  AOP.around(Site, :site_payload) do |site, args, proceed, abort|
    result   = proceed.call

    topics   = site.categories.map{|topic| { "title" => topic[0]["title"], "slug" => topic[0]["slug"], "autoslug" => topic[0]["autoslug"], "posts" => topic[1] } }.sort{|a,b| b["posts"].size <=> a["posts"].size }

    archives = site.posts.inject(ActiveSupport::OrderedHash.new) do |archives, post| 
      year, month, day = post.date.year, post.date.month, post.date.day

      archives[year] ||= ActiveSupport::OrderedHash.new
      archives[year][month] ||= ActiveSupport::OrderedHash["title", Date::MONTHNAMES[month], "days", ActiveSupport::OrderedHash.new, "post_count", 0, "slug", month.to_s.rjust(2,'0')]
      archives[year][month]["post_count"] += 1
      archives[year][month]["days"][day] ||= []
      archives[year][month]["days"][day] << post
      archives
    end

    result["site"]["topics"]   = topics
    result["site"]["archives"] = archives
    result["site"]["static"] = STATIC_LOCATION

    result
  end
end  

Basically it helps me with two things:

  • it registers the custom Liquid functions canonical and urlencode which we’ve seen earlier
  • it creates an hierarchical data set with the posts organised by year, month, day, that I can use in my pages. It’s just the way I am using the data at the moment, so there may be better alternatives, but for me it works.
Contact form

Besides comments, another typical feature of a blog that is basically lost when switching to Jekyll because of the lack of both a database and a server side technology, is that of a contact form.

You could outsource a contact form too, however for added privacy I have preferred adding the support for dynamic content to my static Jekyll blog, through Sinatra. Jekyll generates a completely static site, so it is a no brainer to integrate the two things; also, besides the contact form I may need some dynamic action for something else too in the future, so it’s useful to have Sinatra available while the normal blog is still a super fast static site.

Integrating Sinatra is pretty easy: Sinatra expects all static content that should be ready to be served to clients in the public folder. As mentioned in the configuration section, Jekyll by default generates the static site in _site, but it is easy to change this by setting the option destination in the _config.yml configuration file.

You could change the location of Sinatra’s public folder instead, but since -as we’ll see in the next section- having Sinatra makes Capistrano a natural choice for deployment, and because Capistrano expects the folders images, stylesheets and javascripts to be in public, it’s just easier to change the destination option in Jekyll instead.

Back to the contact form… there are surely various ways of doing this, but at the moment what I do is have an iframe in the contact page that renders the actual contact form, dynamically rendered by Sinatra. Nobody likes iframes, but for the time being it’ll do.

Conclusions

As you can easily guess, this post took me ages to write, but I wanted to write a useful and up to date reference on how to migrate to Jekyll and how to use it, since I am very happy so far with it and I bet an increasing number of people will prefer its approach over WordPress and other heavier solutions.

I think I have covered pretty much all that is needed for anyone from getting started to Jekyll to basically do most things with it, but I am sure there is more to say on the subject. I’ll leave it here for now, but I am already thinking of a couple more posts on Jekyll that I am sure many will find useful; in particular, a more in depth look at various deployment options (besides just copying the static site, as mentioned), how to integrate a simple Sinatra-powered contact form, and some maintenance tips. So keep on eye on this blog if you are interested in knowing more about these topics.

In the meantime, I hope you’ll find in this post the information you were looking for, and if you have questions, suggestions or any thoughts, don’t be shy and let me know in the comments.