Using multiple ways to find programming language used in files, based on Github's Linguist syntax-highlighting linguistic
Github's linguist but in crystal.

Linguist will use different ways to find what type of programming language every file is, which can be used for stats or for highlights.

We only have filename, extension name and classifier mapping now but support languages.yml-format and samples format from Github's linguist. Hopefully soon we will add the rest, like Heuristics and shebang filtering support.

We can not promise that the loaded data in ./data is up to date. So if you want to be sure, let's train it again with overwrite set to true.


  1. Add the dependency to your shard.yml:

        github: microgit-com/
  2. Run shards install


require "linguist"

Set path to the languages.yml if it is not working like this:

Linguist.configure do |settings|
  settings.path = "./config/linguist/languages.yml"

The languages.yml can be found in the git repo of this or a more up to date one on github's linguist repo at

Using repository

repo ="./")
linguist =
linguist.with_repo(repo, repo.head.target_id)

logger =

langs = linguist.languages langs


We have this todo:

  • [x] Repository blob support
  • [x] Classifier
  • [x] Filename-finder
  • [x] Extension-finder
  • [ ] Heuristics support
  • [ ] Shebang filter support
  • [ ] simple file text check without repository.


  1. Fork it (
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request


  github: microgit-com/
  version: ~> 0.2.2
